Alex Behm has posted comments on this change. Change subject: IMPALA-3687: Use type from Avro schema during schema reconciliation. ......................................................................
Patch Set 2: (8 comments) http://gerrit.cloudera.org:8080/#/c/3331/4//COMMIT_MSG Commit Message: Line 7: IMPALA-3687: Use type from avro schema during schema reconcilation. Always prefer the Avro field name during schema reconciliation. Line 9: There is no effect that we change the name of a string/char/varchar This blurb is very difficult to follow. It would be good to summarize the new behavior, i.e., in case of inconsistencies we generally prefer the name, type, and comment from the Avro schema, except for CHAR/VARCHAR where we retain the CHAR/VARCHAR type but keep the name and comment from the Avro column. http://gerrit.cloudera.org:8080/#/c/3331/2/fe/src/main/java/com/cloudera/impala/util/AvroSchemaUtils.java File fe/src/main/java/com/cloudera/impala/util/AvroSchemaUtils.java: PS2, Line 116: : : : > I know but the code does not "reconcile" that at all right? Bharath is correct that the comment had an original intention. The comment was put here to describe what will happen in cases like "create table t (ts timestamp) stored as avro". The resulting table will have a STRING type. But the 'root cause' of this behavior is actually in how we synthesize an Avro schema from the column definitions, so the comment is somewhat out of place here. Let's move this comment into AvroSchemaConverter.java and rephrase as appropriate. http://gerrit.cloudera.org:8080/#/c/3331/4/fe/src/main/java/com/cloudera/impala/util/AvroSchemaUtils.java File fe/src/main/java/com/cloudera/impala/util/AvroSchemaUtils.java: PS4, Line 113: A grammar; also could be rephrased for clarity, e.g.: A CHAR/VARCHAR column definition maps to an Avro STRING. The reconciled column will preserve the CHAR/VARCHAR type but prefer the column name and comment from the Avro column. also move this blurb to L134 http://gerrit.cloudera.org:8080/#/c/3331/2/testdata/workloads/functional-query/queries/QueryTest/avro-schema-changes.test File testdata/workloads/functional-query/queries/QueryTest/avro-schema-changes.test: PS2, Line 71: # IMPALA-3687 > I am fine with anything.. what do you think @Alex? Agree that avro-schema-resolution.test is more appropriate. http://gerrit.cloudera.org:8080/#/c/3331/4/testdata/workloads/functional-query/queries/QueryTest/avro-schema-changes.test File testdata/workloads/functional-query/queries/QueryTest/avro-schema-changes.test: Line 71: # IMPALA-3687. Create a table that has different avro schema from its column definition Create a table with mismatched column definitions and Avro schema. Line 86: LOCATION '$FILESYSTEM_PREFIX/test-warehouse/alltypesaggmultifilesnopart_avro_snap' remove this clause - it only adds confusion Line 95: {"type": ["float", "null"], "name": "float_col_changed", "doc": "float comment changed"}, typo: changed -- To view, visit http://gerrit.cloudera.org:8080/3331 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia3e43b2885853c2b4f207a45a873c9d7f31379cd Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Huaisi Xu <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Huaisi Xu <[email protected]> Gerrit-HasComments: Yes
