Bharath Vissapragada has uploaded a new change for review. http://gerrit.cloudera.org:8080/3045
Change subject: IMPALA:3314: Fix avro scanner crash in partitioned multi-file-format tables ...................................................................... IMPALA:3314: Fix avro scanner crash in partitioned multi-file-format tables Bug: Impalads crash if we query a partitioned table with multiple file formats and one of them is avro and the base table is non-avro. Cause: This happens because we don't set avroSchema_ in HdfsTable during metadata load if the base table is not backed by AVRO file format. Hence it is not propagated to the avro scanner which doesn't have appropriate checks to make sure the schema is non-null. Fix: The fix has two parts. 1. Avro scanner should gracefully handle the case where the avro schema is not set. Appropriate null checks have been added. 2. avroSchema_ should be set in HdfsTable even if any subset of partitions are backed by avro. This is done by decoupling the code that sets the avroSchema_ from HdfsTable#loadSchema() as we need the partition metadata to make this decision. So we set once the partition information is loaded. Testing: This patch adds a new table 'multifileformat_tbl' to the functional test schema. This table is based with TEXTFILE format with 4 partitions of different file formats (text, parquet, avro, rcfile). We run a count(*) on this table to tally the row count. Without this patch, this query deterministically crashes the impalads. Change-Id: I09262d3a7b85a2263c721f3beafd0cab2a1bdf4b --- M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M fe/src/main/java/com/cloudera/impala/catalog/HdfsPartition.java M fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java M testdata/datasets/functional/functional_schema_template.sql M testdata/datasets/functional/schema_constraints.csv M testdata/workloads/functional-query/queries/QueryTest/mixed-format.test 7 files changed, 114 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/45/3045/1 -- To view, visit http://gerrit.cloudera.org:8080/3045 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I09262d3a7b85a2263c721f3beafd0cab2a1bdf4b Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Bharath Vissapragada <[email protected]>
