[Impala-CR](cdh5-trunk) IMPALA-2428: Support multiple-character string as the field delimiter

Jim Apple (Code Review) Mon, 20 Jun 2016 10:06:13 -0700

Jim Apple has posted comments on this change.

Change subject: IMPALA-2428: Support multiple-character string as the field 
delimiter
......................................................................



Patch Set 1:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/3314/1//COMMIT_MSG
Commit Message:

PS1, Line 11: allowed to have only one byte.
"allowed to have only one byte" and "only allowed to have one byte" mean 
slightly different things. I assume you mean the latter?


http://gerrit.cloudera.org:8080/#/c/3314/1/be/src/exec/delimited-text-parser-test.cc
File be/src/exec/delimited-text-parser-test.cc:

Can you also add some end-to-end tests, please?


PS1, Line 82: filed
"field"


http://gerrit.cloudera.org:8080/#/c/3314/1/be/src/exec/delimited-text-parser.cc
File be/src/exec/delimited-text-parser.cc:

Line 46:   if (field_delim.size() == 1) {
How should DelimitedTextParse behave if field_delim.size() == 0?


Line 145:           || **byte_buffer_ptr == collection_item_delim_) {
This condition should probably go first, since it is cheaper.


http://gerrit.cloudera.org:8080/#/c/3314/1/be/src/exec/delimited-text-parser.h
File be/src/exec/delimited-text-parser.h:

Line 46:       char tuple_delim, const string& field_delim_ = string("\0", 1),
Are there any restrictions about field_delim_? For instance, is the string 
"\0\0" allowed? Can tuple_delim be the first character of field_delim_? What 
about the second?


Line 153:   /// SSE(xmm) register containing the tuple search character(s).
This and the other comments about SSE should be clarified to explain how they 
behave when field_delim_ is not a single character.


http://gerrit.cloudera.org:8080/#/c/3314/1/be/src/exec/hdfs-sequence-table-writer.cc
File be/src/exec/hdfs-sequence-table-writer.cc:

Line 202:         || str_val->ptr[i] == escape_char_) {
This condition should probably go first, since it is cheaper


http://gerrit.cloudera.org:8080/#/c/3314/1/be/src/exec/hdfs-text-table-writer.cc
File be/src/exec/hdfs-text-table-writer.cc:

Line 210:           || str_val->ptr[i] == escape_char_)) {
This condition should probably go first, since it is easier to check.


http://gerrit.cloudera.org:8080/#/c/3314/1/fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java
File fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java:

Line 268:     String fieldDelim = 
StringEscapeUtils.unescapeJava(rowFormat_.getFieldDelimiter());
Can you explain why it makes sense to use of unescapeJava here? I notice it 
isn't used elsewhere in the analyzer.


http://gerrit.cloudera.org:8080/#/c/3314/1/fe/src/main/java/com/cloudera/impala/catalog/HdfsStorageDescriptor.java
File fe/src/main/java/com/cloudera/impala/catalog/HdfsStorageDescriptor.java:

PS1, Line 44: /h
missing space between / and h


Line 171:     if (escapeChar == (byte)fieldDelim.charAt(0) ||
Should you check before here that fieldDelim is not null and has length > 0?


-- 
To view, visit http://gerrit.cloudera.org:8080/3314
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id1437ca35dc4fdc58a7db1c2c70d4da30adf0c3e
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Yuanhao Luo <[email protected]>
Gerrit-Reviewer: Jim Apple <[email protected]>
Gerrit-HasComments: Yes

[Impala-CR](cdh5-trunk) IMPALA-2428: Support multiple-character string as the field delimiter

Reply via email to