Skye Wanderman-Milne has posted comments on this change. Change subject: IMPALA-1578: fix text scanner to handle "\r\n" delimiters split across blocks ......................................................................
Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/2803/2/be/src/exec/hdfs-text-scanner.cc File be/src/exec/hdfs-text-scanner.cc: Line 613: If so, the tuple after it is considered : // part of the next scan range > what if this isn't the last buffer of this scan range? i.e. why don't we ha Good catch, we need to handle the non-eosr case differently. http://gerrit.cloudera.org:8080/#/c/2803/2/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: Line 441: "/ > get_fs_path() Done Line 456: assert result.data == ['abc', 'de', 'fg', 'hij', 'klm', 'no'] > could we also test cases where the \r\n span a buffer but not a scan range? Done. Initially the new test actually passed due to another bug (with or without this fix), which I'll post a patch for soon. I'll post this updated patch for now though and you can imagine that it works as expected :) -- To view, visit http://gerrit.cloudera.org:8080/2803 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id42b441674bb21517ad2788b99942a4b5dc55420 Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Skye Wanderman-Milne <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
