Yuanhao Luo has uploaded a new patch set (#2). Change subject: IMPALA-2428: Support multiple-character string as the field delimiter ......................................................................
IMPALA-2428: Support multiple-character string as the field delimiter This commit add support for multi-byte string as the field delimiter. Mean while other separators(e.g. escape char, line delimiter and key-map delimiter) are only allowed to have one byte. TODO: Thinking that SSE4_2 doesn't support multi-byte matching, this commit supports multi-byte field delimiter via direct string matching. As a result, we would get poor performance if the multi-byte field delimiter is relatively long. Maybe we can get better performance via better string matching algorithm such as KMP. Change-Id: Id1437ca35dc4fdc58a7db1c2c70d4da30adf0c3e --- M be/src/exec/delimited-text-parser-test.cc M be/src/exec/delimited-text-parser.cc M be/src/exec/delimited-text-parser.h M be/src/exec/delimited-text-parser.inline.h M be/src/exec/hdfs-sequence-table-writer.cc M be/src/exec/hdfs-sequence-table-writer.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-table-writer.cc M be/src/exec/hdfs-text-table-writer.h M be/src/runtime/descriptors.h M common/thrift/CatalogObjects.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java M fe/src/main/java/com/cloudera/impala/catalog/HdfsStorageDescriptor.java 14 files changed, 202 insertions(+), 67 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/14/3314/2 -- To view, visit http://gerrit.cloudera.org:8080/3314 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id1437ca35dc4fdc58a7db1c2c70d4da30adf0c3e Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Yuanhao Luo <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]>
