anujphadke has submitted this change and it was merged. Change subject: IMPALA-3441, IMPALA-3659: check for malformed Avro data ......................................................................
IMPALA-3441, IMPALA-3659: check for malformed Avro data This patch adds error checking to the Avro scanner (both the codegen'd and interepted paths), including out-of-bounds checks and data validity checks. I ran a local benchmark using the following queries: set num_scanner_threads=1; select count(i) from default.avro_bigints_big; # file contains only longs select max(l_orderkey) from biglineitem_avro; # file has tpch.lineitem schema Both benchmark queries see negligable or no performance impact. This patch adds a new Avro scanner unit test and an end-to-end test that queries several corrupted files, as well as updates the zig-zag varlen int unit test. Change-Id: I801a11c496a128e02c564c2a9c44baa5a97be132 Reviewed-on: http://gerrit.cloudera.org:8080/3072 Reviewed-by: Dan Hecht <[email protected]> Tested-by: Internal Jenkins (cherry picked from commit fbb41c69a0102796979628b1a4925e96cbc967f0) Reviewed-on: http://gerrit.sjc.cloudera.com:8080/13615 Reviewed-by: Anuj Phadke <[email protected]> Tested-by: Anuj Phadke <[email protected]> (cherry picked from commit ed6291885407e522ea58f208b24ab6fd127be072) Reviewed-on: http://gerrit.cloudera.org:8080/3537 Reviewed-by: anujphadke <[email protected]> Tested-by: anujphadke <[email protected]> --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/base-sequence-scanner.h M be/src/exec/hdfs-avro-scanner-ir.cc A be/src/exec/hdfs-avro-scanner-test.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/hdfs-avro-table-writer.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/read-write-util.cc M be/src/exec/read-write-util.h M be/src/exec/scanner-context.cc M be/src/exec/scanner-context.h M be/src/exec/scanner-context.inline.h M be/src/exec/zigzag-test.cc M common/thrift/generate_error_codes.py A testdata/bad_avro_snap/README A testdata/bad_avro_snap/invalid_union.avro A testdata/bad_avro_snap/negative_string_len.avro A testdata/bad_avro_snap/truncated_float.avro A testdata/bad_avro_snap/truncated_string.avro M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv A testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test M tests/common/test_result_verifier.py M tests/data_errors/test_data_errors.py 26 files changed, 1,202 insertions(+), 199 deletions(-) Approvals: anujphadke: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/3537 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I801a11c496a128e02c564c2a9c44baa5a97be132 Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-2.5.0_5.7.x Gerrit-Owner: anujphadke <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Skye Wanderman-Milne <[email protected]> Gerrit-Reviewer: anujphadke <[email protected]>
