Blazer-007 commented on code in PR #4133:
URL: https://github.com/apache/gobblin/pull/4133#discussion_r2314377774


##########
gobblin-api/src/main/java/org/apache/gobblin/compat/hadoop/TextSerializer.java:
##########
@@ -31,20 +30,21 @@ public class TextSerializer {
    * Serialize a String using the same logic as a Hadoop Text object
    */
   public static void writeStringAsText(DataOutput stream, String str) throws 
IOException {
-    byte[] utf8Encoded = str.getBytes(StandardCharsets.UTF_8);
-    writeVLong(stream, utf8Encoded.length);
-    stream.write(utf8Encoded);
+    writeVLong(stream, str.length());
+    stream.writeBytes(str);

Review Comment:
   I think this is a good suggestion - 
https://www.cs.helsinki.fi/group/boi2016/doc/java/api/java/io/DataOutput.html#writeBytes-java.lang.String-
 
   Should we have some handling for this as well ? @thisisArjit 
   
   ```
   for (int i = 0; i < str.length(); i++) {
       if (str.charAt(i) > 0x7F) {
           throw new IllegalArgumentException("Non-ASCII character detected.");
       }
   }
   writeVLong(stream, str.length());
   stream.writeBytes(str); // writes 1 byte per character
    
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@gobblin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to