Blazer-007 commented on code in PR #4133: URL: https://github.com/apache/gobblin/pull/4133#discussion_r2314377774
########## gobblin-api/src/main/java/org/apache/gobblin/compat/hadoop/TextSerializer.java: ########## @@ -31,20 +30,21 @@ public class TextSerializer { * Serialize a String using the same logic as a Hadoop Text object */ public static void writeStringAsText(DataOutput stream, String str) throws IOException { - byte[] utf8Encoded = str.getBytes(StandardCharsets.UTF_8); - writeVLong(stream, utf8Encoded.length); - stream.write(utf8Encoded); + writeVLong(stream, str.length()); + stream.writeBytes(str); Review Comment: I think this is a good suggestion - https://www.cs.helsinki.fi/group/boi2016/doc/java/api/java/io/DataOutput.html#writeBytes-java.lang.String- Should we have some handling for this as well ? @thisisArjit ``` for (int i = 0; i < str.length(); i++) { if (str.charAt(i) > 0x7F) { throw new IllegalArgumentException("Non-ASCII character detected."); } } writeVLong(stream, str.length()); stream.writeBytes(str); // writes 1 byte per character ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@gobblin.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org