Alex Behm has posted comments on this change. Change subject: IMPALA:3314: Fix avro scanner crash in partitioned multi-file-format tables ......................................................................
Patch Set 1: (18 comments) http://gerrit.cloudera.org:8080/#/c/3045/1//COMMIT_MSG Commit Message: Line 7: IMPALA:3314: Fix avro scanner crash in partitioned multi-file-format tables I think a more appropriate title would be something lile: Fix querying tables/partitions altered to Avro format. Line 9: Bug: Impalads crash if we query a partitioned table with multiple file Description seems too complicated. I think Impalads will crash when querying a table or partition that was altered to the Avro format. Line 12: Cause: This happens because we don't set avroSchema_ in HdfsTable during during metadata load -> during the alteration operation please make other appropriate changes to this paragraph to reflect the more general problem http://gerrit.cloudera.org:8080/#/c/3045/1/be/src/exec/hdfs-avro-scanner.cc File be/src/exec/hdfs-avro-scanner.cc: Line 63: return Status("Missing avro schema in scan node"); Let's include the possible workaround. something like: "This could indicate stale metadata if the table was recently altered. Running invalidate metadata <table> on this table may resolve the problem." Line 739: if (record.schema == NULL) return Status("Missing avro schema in scan node"); same comment regarding mentioning workarounds http://gerrit.cloudera.org:8080/#/c/3045/1/fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java File fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java: Line 1007: if (loadTableSchema) loadSchema(client, msTbl); call setAvroSchema() in the loadTableSchema case Line 1012: setAvroSchema(client, msTbl); why do we need this? Line 1261: * Sets avroSchema_ if the table or any of the partitions in the table are backed are stored as Avro. Line 1262: * by AVRO file format. Additionally this method also reconciles the schema if missing comma after Additionally ... reconciles the schemas ... Line 1263: * the column definitions from the metastore differ from the actual schema. 'actual schema' is vague, say 'differ from the Avro schema' Line 1267: String inputFormat = msTbl.getSd().getInputFormat(); Add Preconditions check that asserts loadSchema() has already been called. Line 1268: if (HdfsFileFormat.fromJavaClassName(inputFormat) == HdfsFileFormat.AVRO || Is this first check still needed? An unpartitioned table should have a single partition. Line 1270: // Look for avro schema in TBLPROPERTIES and in SERDEPROPERTIES, with the latter Use "Avro" everywhere consistently Line 1277: // Set the avroSchema_ if it is found in any of the search locations. This is Comment could be misleading because it's not just "useful" in the described case. This step is required, and regardless of partitioned/unpartitioned. Line 1291: // At this point, nonPartFieldSchemas_ already has the field schemas from HMS. Confusing comment because this is a precondition of this function, and not really applicable here specifically. Line 1330: * Loads table schema from Hive Metastore. It also loads column stats. Loads the table schema and column stats from the Hive Metastore. http://gerrit.cloudera.org:8080/#/c/3045/1/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: Line 1902: functional Let's try not to add one-off-tables for bugfixes in the schema template. Imo, this be a self-contained pytest in TestAvroSchemaResolution. http://gerrit.cloudera.org:8080/#/c/3045/1/testdata/workloads/functional-query/queries/QueryTest/mixed-format.test File testdata/workloads/functional-query/queries/QueryTest/mixed-format.test: Line 28: # IMPALA-3314: Run a query on multi-file-format table with an avro partition This test feels more like it should go into TestAvroSchemaResolution. Imo, the testing has two parts: * test that Impala gets back the expected error message when altering to Avro without a metadata reload * when you do a metadata reload the table works just finr -- To view, visit http://gerrit.cloudera.org:8080/3045 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I09262d3a7b85a2263c721f3beafd0cab2a1bdf4b Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-HasComments: Yes
