Skye Wanderman-Milne has uploaded a new change for review. http://gerrit.cloudera.org:8080/2803
Change subject: IMPALA-1578: fix text scanner to handle "\r\n" delimiters split across blocks ...................................................................... IMPALA-1578: fix text scanner to handle "\r\n" delimiters split across blocks This patch modifies HdfsTextScanner to specifically check for split "\r\n" delimiters when the scan range ends with '\r'. If there does turn out to be a split delimiter, the next tuple is considered the responsibility of the next scan range's scanner, as if the delimiter appeared fully in the second scan range. This should not affect the overall performance characteristics of the text scanner since it already must do a remote read past the end of the scan range to read the last tuple. Change-Id: Id42b441674bb21517ad2788b99942a4b5dc55420 --- M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M tests/query_test/test_scanners.py 3 files changed, 123 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/03/2803/1 -- To view, visit http://gerrit.cloudera.org:8080/2803 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Id42b441674bb21517ad2788b99942a4b5dc55420 Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne <[email protected]>
