Henry Robinson has uploaded a new change for review. http://gerrit.cloudera.org:8080/3526
Change subject: IMPALA-3798: Disable per-split filtering for sequence-based scanners ...................................................................... IMPALA-3798: Disable per-split filtering for sequence-based scanners If a runtime filter rejects a sequence-based format's header split (but not the entire file, which may happen if the filter has not arrived in time), the scanner will never mark all splits for that file complete. This is because BaseSequenceScanner issues scan ranges after parsing the header splits, and until those ranges are processed, RangeComplete() and AddDiskIoRanges() will not be called - those methods update progress_ and num_unqueued_files_ respectively. HdfsScanNode::ScannerThread() reads those variables to decide whether to exit, and as a result will spin forever. This bug therefore only shows up when there is >1 scan range per file. This patch disables per-split filtering for Avro, RC and sequence files in lieu of a permanent fix which marks all scan ranges for a file as done as soon as one range is filtered out. Testing: A custom cluster test is added which disables file filtering, emulating the race condition that leads to the hang when a query that filters scan ranges is run. Without the fix, this test hangs, with the fix the query completes as expected. MAX_SCAN_RANGE_LENGTH is used to ensure >1 scan range per file. Change-Id: I4770dd77fd4258c24115d72b572c727b770bd75d --- M be/src/common/global-flags.cc M be/src/exec/hdfs-scan-node.cc A tests/custom_cluster/test_seq_file_filtering.py 3 files changed, 86 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/26/3526/1 -- To view, visit http://gerrit.cloudera.org:8080/3526 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I4770dd77fd4258c24115d72b572c727b770bd75d Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-2.6.0_5.8.0 Gerrit-Owner: Henry Robinson <[email protected]>
