abhioncbr commented on code in PR #11218:
URL: https://github.com/apache/pinot/pull/11218#discussion_r1278470408


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/SegmentColumnarIndexCreator.java:
##########
@@ -605,30 +615,30 @@ static boolean isValidPropertyValue(String value) {
   @VisibleForTesting
   static String getValidPropertyValue(String value, boolean isMax, DataType 
dataType) {
     String valueWithinLengthLimit = getValueWithinLengthLimit(value, isMax, 
dataType);
-    int length = valueWithinLengthLimit.length();
+    String escapedValue = StringEscapeUtils.escapeJava(valueWithinLengthLimit);
+    int length = escapedValue.length();
 
     if (length > 0) {
       // check and replace first character if it's a white space
-      if (Character.isWhitespace(valueWithinLengthLimit.charAt(0))) {
-        String unicodeValue = "\\u" + String.format("%04x", (int) 
valueWithinLengthLimit.charAt(0));
-        valueWithinLengthLimit = unicodeValue + 
valueWithinLengthLimit.substring(1);
+      if (Character.isWhitespace(escapedValue.charAt(0))) {

Review Comment:
   Here is my understanding of the problem.
   
   - If the original value has leading and trailing whitespace characters, we 
must escape them to get the same value from the saved property file. This issue 
happens because of the String method `trim` usage. Here are the 
properties-configuration codepoints, reading the entire 
[line](https://github.com/apache/commons-configuration/blob/RELEASE_1_10_BRANCH/src/main/java/org/apache/commons/configuration/PropertiesConfiguration.java#L748)
 and value 
[parse](https://github.com/apache/commons-configuration/blob/RELEASE_1_10_BRANCH/src/main/java/org/apache/commons/configuration/PropertiesConfiguration.java#L927)
   
   - Here is the valid example code for which it fails with the use of escaping 
as well, 
   ```java
   public static void main(String[] args)
         throws ConfigurationException {
       writeProperty();
       readProperty();
     }
   
     public static void writeProperty()
         throws ConfigurationException {
       // case 1: fails for leading space, we need to preserve the leading 
space.
       PropertiesConfiguration configuration = new PropertiesConfiguration(new 
File("test"));
   
       String valueWithLeadingSpace = "  valueWithLeadingSpace";
       String escapedValueWithLeadingSpace = 
StringEscapeUtils.escapeJava(valueWithLeadingSpace);
       System.out.println(escapedValueWithLeadingSpace); // output: '  
valueWithLeadingSpace'
       configuration.setProperty("valueWithLeadingSpace", 
escapedValueWithLeadingSpace);
   
   
       // case 2: fails for comma, we need to escape commas
       String valueWithComma = "value,with,comma";
       String escapedValueWithComma =  
StringEscapeUtils.escapeJava(valueWithComma);
       System.out.println(escapedValueWithComma); // output: 'value,with,comma'
       configuration.setProperty("valueWithComma", escapedValueWithComma);
   
   
       configuration.save();
     }
   
     public static void readProperty()
         throws ConfigurationException {
       PropertiesConfiguration configuration = new PropertiesConfiguration(new 
File("test"));
   
       String valueFromPropertyFile = (String) 
configuration.getProperty("valueWithLeadingSpace");
       System.out.println(valueFromPropertyFile);  // output: 
'valueWithLeadingSpace'
       String unescapedValueWithLeadingSpace = 
StringEscapeUtils.unescapeJava(valueFromPropertyFile);
       System.out.println(unescapedValueWithLeadingSpace); // output: 
'valueWithLeadingSpace'
   
       String valueWithCommaFromPropertyFile = (String) 
configuration.getProperty("valueWithComma"); // throws exception here 
considering it as array
       System.out.println(valueWithCommaFromPropertyFile);
       String unescapedValueWithCommaFromPropertyFile = 
StringEscapeUtils.unescapeJava(valueWithCommaFromPropertyFile);
       System.out.println(unescapedValueWithCommaFromPropertyFile);
     }
   ```
   For handling the above cases, we are adding the Unicode-based string value 
of the whitespace character and unescaping it while reading it back.
   
   please let me know, if I am missing something or my interpretation is 
incorrect. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to