Alex Behm has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/3299

Change subject: IMPALA-3646: Handle corrupt RLE literal or repeat counts of 0.
......................................................................

IMPALA-3646: Handle corrupt RLE literal or repeat counts of 0.

Adds handling and testing for a specific Parquet data corruption
scenario with plain dictionary encoded values.

The problematic scenario is when the repeat or literal count of
the RLE-encoded dictionary indexes is decoded as 0 - an invalid value.

There are several other cases of data corruption that are not yet
handled gracefully. This patch only handles one specific case.

Change-Id: Ibf406c82cdded37966f09c81e4cc1446d2b60d63
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/util/rle-encoding.h
M be/src/util/rle-test.cc
M testdata/data/README
A testdata/data/bad_rle_literal_count.parquet
A testdata/data/bad_rle_repeat_count.parquet
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-corrupt-rle-counts-abort.test
A 
testdata/workloads/functional-query/queries/QueryTest/parquet-corrupt-rle-counts.test
M tests/query_test/test_scanners.py
9 files changed, 85 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/99/3299/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ibf406c82cdded37966f09c81e4cc1446d2b60d63
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Alex Behm <[email protected]>

Reply via email to