Tim Armstrong has posted comments on this change.

Change subject: IMPALA-2700: ASCII NUL characters are doubled on insert into 
text tables
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/3703/1/be/src/exec/hdfs-text-table-writer.cc
File be/src/exec/hdfs-text-table-writer.cc:

PS1, Line 208: str_val->ptr[i] == field_delim_
I don't think we want to escape field delimiters if the escape char is '\0'.


Line 208:     if (UNLIKELY(str_val->ptr[i] == field_delim_ || (str_val->ptr[i] 
== escape_char_ &&
As discussed, I think we should separate out the escaped and unescaped code 
paths - it'll be easier to follow and perform better. As-is we're checking the 
characters one-by-one for the escape character and delimiter, then ignoring 
them once we find them.

I.e.

if (escape_char_ == '\0') {
  ... Just copy str_val into the string verbatim ...
} else {
  ... Run the existing code ...
}


-- 
To view, visit http://gerrit.cloudera.org:8080/3703
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia30fa314d1ee1e99f9e7598466eb1570ca7940fc
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: anujphadke <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to