[
https://issues.apache.org/jira/browse/IMPALA-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Quanlong Huang updated IMPALA-2229:
-----------------------------------
Epic Link: IMPALA-12887
> Inconsistent behavior between Impala and Hive when creating an Avro table
> with an Avro schema in SERDEPROPERTIES and TBLPROPERTIES.
> -----------------------------------------------------------------------------------------------------------------------------------
>
> Key: IMPALA-2229
> URL: https://issues.apache.org/jira/browse/IMPALA-2229
> Project: IMPALA
> Issue Type: Bug
> Components: Catalog
> Affects Versions: Impala 1.3, Impala 1.4, Impala 2.0, Impala 2.1, Impala
> 2.2
> Reporter: Alexander Behm
> Priority: Minor
> Labels: incompatibility
>
> It looks like Impala and Hive search the possible locations for an Avro
> schema in different orders. See the different behavior for Impala and Hive
> using the following create table stmt:
> {code}
> CREATE TABLE t
> ROW FORMAT SERDE
> 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> WITH SERDEPROPERTIES
> ('avro.schema.literal'='{"name": "my_record", "type": "record",
> "fields": [{"name": "serde_string", "type": "string"}]}')
> TBLPROPERTIES
> ('avro.schema.literal'='{"name": "my_record", "type": "record",
> "fields": [{"name": "tblprop_string", "type": "string"}]}');
> {code}
> Run the CREATE TABLE and DESC in Hive:
> {code}
> hive> CREATE TABLE t
> > ROW FORMAT SERDE
> > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> > WITH SERDEPROPERTIES
> > ('avro.schema.literal'='{"name": "my_record", "type": "record",
> > "fields": [{"name": "serde_string", "type": "string"}]}')
> > TBLPROPERTIES
> > ('avro.schema.literal'='{"name": "my_record", "type": "record",
> > "fields": [{"name": "tblprop_string", "type": "string"}]}');
> OK
> Time taken: 0.689 seconds
> hive> desc t;
> OK
> tblprop_string string from deserializer
> Time taken: 0.224 seconds, Fetched: 1 row(s)
> hive>
> {code}
> Run the CREATE TABLE and DESC in Impala. Note that Impala's syntax is
> slightly different.
> {code}
> [localhost:21000] > CREATE TABLE t
> > WITH SERDEPROPERTIES
> > ('avro.schema.literal'='{"name": "my_record", "type":
> "record",
> > "fields": [{"name": "serde_string", "type": "string"}]}')
> > STORED AS AVRO
> > TBLPROPERTIES
> > ('avro.schema.literal'='{"name": "my_record", "type":
> "record",
> > "fields": [{"name": "tblprop_string", "type":
> "string"}]}');
> Query: create TABLE t
> WITH SERDEPROPERTIES
> ('avro.schema.literal'='{"name": "my_record", "type": "record",
> "fields": [{"name": "serde_string", "type": "string"}]}')
> STORED AS AVRO
> TBLPROPERTIES
> ('avro.schema.literal'='{"name": "my_record", "type": "record",
> "fields": [{"name": "tblprop_string", "type": "string"}]}')
> WARNINGS: Ignoring column definitions in favor of Avro schema.
> The Avro schema has 1 column(s) but 0 column definition(s) were given.
> Fetched 0 row(s) in 0.32s
> [localhost:21000] > desc t;
> Query: describe t
> +--------------+--------+-------------------+
> | name | type | comment |
> +--------------+--------+-------------------+
> | serde_string | string | from deserializer |
> +--------------+--------+-------------------+
> Fetched 1 row(s) in 4.83s
> {code}
> The relevant code snippets from Impala can be found in CreateTableStmt.java
> and HdfsTable.java:
> {code}
> // Look for the schema in TBLPROPERTIES and in SERDEPROPERTIES, with the
> latter
> // taking precedence.
> List<Map<String, String>> schemaSearchLocations = Lists.newArrayList();
> schemaSearchLocations.add(
> getMetaStoreTable().getSd().getSerdeInfo().getParameters());
> schemaSearchLocations.add(getMetaStoreTable().getParameters());
> {code}
> We should make Impala behave consistently with Hive. However, this is an
> incompatible change, so we will need to schedule the fix accordingly.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]