Lars Volker created HIVE-14086:
----------------------------------
Summary: org.apache.hadoop.hive.metastore.api.Table does not
return columns from Avro schema file
Key: HIVE-14086
URL: https://issues.apache.org/jira/browse/HIVE-14086
Project: Hive
Issue Type: Bug
Components: API
Reporter: Lars Volker
Consider this table, using an external Avro schema file:
{noformat}
CREATE TABLE avro_table
PARTITIONED BY (str_part STRING)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES (
'avro.schema.url'='hdfs://localhost:20500/tmp/avro.json'
);
{noformat}
This will populate the "COLUMNS_V2" metastore table with the correct column
information (as per HIVE-6308). The columns of this table can then be queried
via the Hive API, for example by calling {{.getSd().getCols()}} on a
{{org.apache.hadoop.hive.metastore.api.Table}} object.
Changes to the avro.schema.url file - either changing where it points to or
changing its contents - will be reflected in the output of {{describe formatted
avro_table}} *but not* in the result of the {{.getSd().getCols()}} API call.
Instead it looks like Hive only reads the Avro schema file internally, but does
not expose the information therein via its API.
Is there a way to obtain the effective Table information via Hive? Would it
make sense to fix table retrieval so calls to {{get_table}} return the correct
set of columns?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)