Alex Behm has submitted this change and it was merged. Change subject: IMPALA-3646: Handle corrupt RLE literal or repeat counts of 0. ......................................................................
IMPALA-3646: Handle corrupt RLE literal or repeat counts of 0. Adds handling and testing for a specific Parquet data corruption scenario with plain dictionary encoded values. The problematic scenario is when the repeat or literal count of the RLE-encoded dictionary indexes is decoded as 0 - an invalid value. There are several other cases of data corruption that are not yet handled gracefully. This patch only handles one specific case. Change-Id: Ibf406c82cdded37966f09c81e4cc1446d2b60d63 Reviewed-on: http://gerrit.cloudera.org:8080/3299 Reviewed-by: Alex Behm <[email protected]> Tested-by: Alex Behm <[email protected]> --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/util/rle-encoding.h M be/src/util/rle-test.cc M testdata/data/README A testdata/data/bad_rle_literal_count.parquet A testdata/data/bad_rle_repeat_count.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-corrupt-rle-counts-abort.test A testdata/workloads/functional-query/queries/QueryTest/parquet-corrupt-rle-counts.test M tests/query_test/test_scanners.py 9 files changed, 85 insertions(+), 3 deletions(-) Approvals: Alex Behm: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/3299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ibf406c82cdded37966f09c81e4cc1446d2b60d63 Gerrit-PatchSet: 4 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Alex Behm <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]>
