Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-1578: fix text scanner to handle "\r\n" delimiters split across blocks ......................................................................
IMPALA-1578: fix text scanner to handle "\r\n" delimiters split across blocks This patch modifies HdfsTextScanner to specifically check for split "\r\n" delimiters when the scan range ends with '\r'. If there does turn out to be a split delimiter, the next tuple is considered the responsibility of the next scan range's scanner, as if the delimiter appeared fully in the second scan range. This should not affect the overall performance characteristics of the text scanner since it already must do a remote read past the end of the scan range to read the last tuple. Change-Id: Id42b441674bb21517ad2788b99942a4b5dc55420 Reviewed-on: http://gerrit.cloudera.org:8080/2803 Reviewed-by: Dan Hecht <[email protected]> Tested-by: Internal Jenkins --- M be/src/exec/delimited-text-parser.cc M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M tests/query_test/test_scanners.py 4 files changed, 171 insertions(+), 16 deletions(-) Approvals: Internal Jenkins: Verified Dan Hecht: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/2803 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Id42b441674bb21517ad2788b99942a4b5dc55420 Gerrit-PatchSet: 9 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Skye Wanderman-Milne <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]>
