Henry Robinson has uploaded a new patch set (#3). Change subject: IMPALA-3804: Push per-split filtering into scanners ......................................................................
IMPALA-3804: Push per-split filtering into scanners IMPALA-3798 was a bug that occurred when a header split was filtered out, without correctly cancelling all the scan ranges in the rest of the file. To fix this properly, we have to make the scanners aware of per-split filtering, since different scanners need to compensate for a filtered scan in different ways. Sequence-based scanners (such as Avro) only issue most of a file's ranges after the header range has been scanner. Therefore if a header split is filtered out, all the remaining ranges can be safely marked as complete. If a non-header split is filtered, it may not be safe to mark as complete a split that may be concurrent scanned by a different scanner. Text and Parquet scanners issue all their ranges up front, so even if one split in a file is filtered all other splits will still be processed. Testing: * Added logic to test_sequence_file_filtering_race to check that per-scan filtering was happening correctly, confirming that the rewritten path was taking effect. Expanded the test to hit all scanner types. * Manually tested existing runtime filters test suite with file filtering disabled, and rewriting tests to expect split filtering instead of file filtering. Tests passed. Change-Id: I9f92178f642695e0e9ef901373a5e9f2878a78ce --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M be/src/exec/scanner-context.h M tests/custom_cluster/test_seq_file_filtering.py 10 files changed, 106 insertions(+), 106 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/61/3561/3 -- To view, visit http://gerrit.cloudera.org:8080/3561 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9f92178f642695e0e9ef901373a5e9f2878a78ce Gerrit-PatchSet: 3 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Henry Robinson <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]>
