abhioncbr commented on code in PR #11218:
URL: https://github.com/apache/pinot/pull/11218#discussion_r1278588943
##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -605,30 +615,30 @@ static boolean isValidPropertyValue(String value) {
@VisibleForTesting
static String getValidPropertyValue(String value, boolean isMax, DataType
dataType) {
String valueWithinLengthLimit = getValueWithinLengthLimit(value, isMax,
dataType);
- int length = valueWithinLengthLimit.length();
+ String escapedValue = StringEscapeUtils.escapeJava(valueWithinLengthLimit);
+ int length = escapedValue.length();
if (length > 0) {
// check and replace first character if it's a white space
- if (Character.isWhitespace(valueWithinLengthLimit.charAt(0))) {
- String unicodeValue = "\\u" + String.format("%04x", (int)
valueWithinLengthLimit.charAt(0));
- valueWithinLengthLimit = unicodeValue +
valueWithinLengthLimit.substring(1);
+ if (Character.isWhitespace(escapedValue.charAt(0))) {
Review Comment:
I did further analysis, are here are my findings
- we only need the unescapeJava in `ColumnMetadataImpl` for converting back
the Unicode whitespace character into the original value we are adding
[here](https://github.com/apache/pinot/pull/11218/files#diff-f77f250efb4d56b596a7991f65ff00139fb73463111619d64741bec4306efa1aR626)
and
[here](https://github.com/apache/pinot/pull/11218/files#diff-f77f250efb4d56b596a7991f65ff00139fb73463111619d64741bec4306efa1aR633).
For comma it's already handled in properties-configuration enhanced
[unescapeJava](https://github.com/apache/commons-configuration/blob/RELEASE_1_10_BRANCH/src/main/java/org/apache/commons/configuration/PropertiesConfiguration.java#L1376)
method.
IMO, let's avoid using escapeJava and unescapeJava. We don't need to
introduce it just for handling the leading or trailing whitespace. For handling
leading or trailing whitespace, here are a couple of options.
1. Instead of replacing the whitespace character value with its Unicode
value, we can add an extra `\` at the start or end. We will remove this
additional `/` based on the flag property( say `isWhitespacePreserved`)
value(since we are already good with escape flag property)
2. The correct way of handling this is using [custom properties readers and
writers](https://commons.apache.org/proper/commons-configuration/userguide/howto_properties.html).
However, I think this requires properties-configuration version `2.x`. We have
already identified the [work](https://github.com/apache/pinot/issues/11085), so
we can wait for the upgrade to be done and drop the handling of whitespace for
now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]