Tim Armstrong has posted comments on this change. Change subject: IMPALA-2700: ASCII NUL characters are doubled on insert into text tables ......................................................................
Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/3703/1/be/src/exec/hdfs-text-table-writer.cc File be/src/exec/hdfs-text-table-writer.cc: PS1, Line 208: str_val->ptr[i] == field_delim_ I don't think we want to escape field delimiters if the escape char is '\0'. Line 208: if (UNLIKELY(str_val->ptr[i] == field_delim_ || (str_val->ptr[i] == escape_char_ && As discussed, I think we should separate out the escaped and unescaped code paths - it'll be easier to follow and perform better. As-is we're checking the characters one-by-one for the escape character and delimiter, then ignoring them once we find them. I.e. if (escape_char_ == '\0') { ... Just copy str_val into the string verbatim ... } else { ... Run the existing code ... } -- To view, visit http://gerrit.cloudera.org:8080/3703 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia30fa314d1ee1e99f9e7598466eb1570ca7940fc Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: anujphadke <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
