Alex Behm has posted comments on this change.

Change subject: IMPALA:3314: Fix avro scanner crash in partitioned 
multi-file-format tables
......................................................................


Patch Set 1:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/3045/1//COMMIT_MSG
Commit Message:

Line 7: IMPALA:3314: Fix avro scanner crash in partitioned multi-file-format 
tables
I think a more appropriate title would be something lile:

Fix querying tables/partitions altered to Avro format.


Line 9: Bug: Impalads crash if we query a partitioned table with multiple file
Description seems too complicated. I think Impalads will crash when querying a 
table or partition that was altered to the Avro format.


Line 12: Cause: This happens because we don't set avroSchema_ in HdfsTable 
during
during metadata load -> during the alteration operation

please make other appropriate changes to this paragraph to reflect the more 
general problem


http://gerrit.cloudera.org:8080/#/c/3045/1/be/src/exec/hdfs-avro-scanner.cc
File be/src/exec/hdfs-avro-scanner.cc:

Line 63:     return Status("Missing avro schema in scan node");
Let's include the possible workaround. something like:

"This could indicate stale metadata if the table was recently altered. Running 
invalidate metadata <table> on this table may resolve the problem."


Line 739:   if (record.schema == NULL) return Status("Missing avro schema in 
scan node");
same comment regarding mentioning workarounds


http://gerrit.cloudera.org:8080/#/c/3045/1/fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java
File fe/src/main/java/com/cloudera/impala/catalog/HdfsTable.java:

Line 1007:       if (loadTableSchema) loadSchema(client, msTbl);
call setAvroSchema() in the loadTableSchema case


Line 1012:         setAvroSchema(client, msTbl);
why do we need this?


Line 1261:    * Sets avroSchema_ if the table or any of the partitions in the 
table are backed
are stored as Avro.


Line 1262:    * by AVRO file format. Additionally this method also reconciles 
the schema if
missing comma after Additionally

... reconciles the schemas ...


Line 1263:    * the column definitions from the metastore differ from the 
actual schema.
'actual schema' is vague, say 'differ from the Avro schema'


Line 1267:     String inputFormat = msTbl.getSd().getInputFormat();
Add Preconditions check that asserts loadSchema() has already been called.


Line 1268:     if (HdfsFileFormat.fromJavaClassName(inputFormat) == 
HdfsFileFormat.AVRO ||
Is this first check still needed? An unpartitioned table should have a single 
partition.


Line 1270:       // Look for avro schema in TBLPROPERTIES and in 
SERDEPROPERTIES, with the latter
Use "Avro" everywhere consistently


Line 1277:       // Set the avroSchema_ if it is found in any of the search 
locations. This is
Comment could be misleading because it's not just "useful" in the described 
case. This step is required, and regardless of partitioned/unpartitioned.


Line 1291:       // At this point, nonPartFieldSchemas_ already has the field 
schemas from HMS.
Confusing comment because this is a precondition of this function, and not 
really applicable here specifically.


Line 1330:    * Loads table schema from Hive Metastore. It also loads column 
stats.
Loads the table schema and column stats from the Hive Metastore.


http://gerrit.cloudera.org:8080/#/c/3045/1/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

Line 1902: functional
Let's try not to add one-off-tables for bugfixes in the schema template. Imo, 
this be a self-contained pytest in TestAvroSchemaResolution.


http://gerrit.cloudera.org:8080/#/c/3045/1/testdata/workloads/functional-query/queries/QueryTest/mixed-format.test
File testdata/workloads/functional-query/queries/QueryTest/mixed-format.test:

Line 28: # IMPALA-3314: Run a query on multi-file-format table with an avro 
partition
This test feels more like it should go into TestAvroSchemaResolution.

Imo, the testing has two parts:
* test that Impala gets back the expected error message when altering to Avro 
without a metadata reload
* when you do a metadata reload the table works just finr


-- 
To view, visit http://gerrit.cloudera.org:8080/3045
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I09262d3a7b85a2263c721f3beafd0cab2a1bdf4b
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-HasComments: Yes

Reply via email to