abhioncbr commented on code in PR #11218:
URL: https://github.com/apache/pinot/pull/11218#discussion_r1278588943


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -605,30 +615,30 @@ static boolean isValidPropertyValue(String value) {
   @VisibleForTesting
   static String getValidPropertyValue(String value, boolean isMax, DataType 
dataType) {
     String valueWithinLengthLimit = getValueWithinLengthLimit(value, isMax, 
dataType);
-    int length = valueWithinLengthLimit.length();
+    String escapedValue = StringEscapeUtils.escapeJava(valueWithinLengthLimit);
+    int length = escapedValue.length();
 
     if (length > 0) {
       // check and replace first character if it's a white space
-      if (Character.isWhitespace(valueWithinLengthLimit.charAt(0))) {
-        String unicodeValue = "\\u" + String.format("%04x", (int) 
valueWithinLengthLimit.charAt(0));
-        valueWithinLengthLimit = unicodeValue + 
valueWithinLengthLimit.substring(1);
+      if (Character.isWhitespace(escapedValue.charAt(0))) {

Review Comment:
   I did further analysis, are here are my findings
   - we only need the unescapeJava in `ColumnMetadataImpl` for converting back 
the Unicode whitespace character into the original value we are adding 
[here](https://github.com/apache/pinot/pull/11218/files#diff-f77f250efb4d56b596a7991f65ff00139fb73463111619d64741bec4306efa1aR626)
 and 
[here](https://github.com/apache/pinot/pull/11218/files#diff-f77f250efb4d56b596a7991f65ff00139fb73463111619d64741bec4306efa1aR633).
 For comma it's already handled in properties-configuration enhanced 
[unescapeJava](https://github.com/apache/commons-configuration/blob/RELEASE_1_10_BRANCH/src/main/java/org/apache/commons/configuration/PropertiesConfiguration.java#L1376)
 method.
   
   IMO, let's avoid using escapeJava and unescapeJava. We don't need to 
introduce it just for handling the leading or trailing whitespace. For handling 
leading or trailing whitespace, here are a couple of options. 
   
   1. Instead of replacing the whitespace character value with its Unicode 
value, we can add an extra `\` at the start or end. We will remove this 
additional `/` based on the flag property( say `isWhitespacePreserved`) 
value(since we are already good with escape flag property)
   2. The correct way of handling this is using [custom properties readers and 
writers](https://commons.apache.org/proper/commons-configuration/userguide/howto_properties.html).
 However, I think this requires properties-configuration version `2.x`. We have 
already identified the [work](https://github.com/apache/pinot/issues/11085), so 
we can wait for the upgrade to be done and drop the handling of whitespace for 
now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to