Yuanhao Luo has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/3314

Change subject: IMPALA-2428: Support multiple-character string as the field 
delimiter
......................................................................

IMPALA-2428: Support multiple-character string as the field delimiter

This commit add support for multi-byte string as the field delimiter.
Mean while other separators(e.g. escape char, line delimiter and key-map
delimiter) are allowed to have only one byte.

TODO: Thinking that SSE4_2 doesn't support multi-byte matching, this
commit supports multi-byte field delimiter via direct string matching.
As a result, we would get poor performance if the multi-byte field
delimiter is relatively long. Maybe we can get better performance via
better string matching algorithm such as KMP.

Change-Id: Id1437ca35dc4fdc58a7db1c2c70d4da30adf0c3e
---
M be/src/exec/delimited-text-parser-test.cc
M be/src/exec/delimited-text-parser.cc
M be/src/exec/delimited-text-parser.h
M be/src/exec/delimited-text-parser.inline.h
M be/src/exec/hdfs-sequence-table-writer.cc
M be/src/exec/hdfs-sequence-table-writer.h
M be/src/exec/hdfs-text-scanner.cc
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/hdfs-text-table-writer.h
M be/src/runtime/descriptors.h
M common/thrift/CatalogObjects.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/com/cloudera/impala/analysis/CreateTableStmt.java
M fe/src/main/java/com/cloudera/impala/catalog/HdfsStorageDescriptor.java
14 files changed, 139 insertions(+), 57 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/14/3314/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3314
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Id1437ca35dc4fdc58a7db1c2c70d4da30adf0c3e
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Yuanhao Luo <[email protected]>

Reply via email to