Zhenxiao Luo created HIVE-3442: ---------------------------------- Summary: AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table Key: HIVE-3442 URL: https://issues.apache.org/jira/browse/HIVE-3442 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Zhenxiao Luo Assignee: Zhenxiao Luo Fix For: 0.10.0
After creating a table and load data into it, I could check that the table is created successfully, and data is inside: DROP TABLE IF EXISTS ml_items; CREATE TABLE ml_items(id INT, title STRING, release_date STRING, video_release_date STRING, imdb_url STRING, unknown_genre TINYINT, action TINYINT, adventure TINYINT, animation TINYINT, children TINYINT, comedy TINYINT, crime TINYINT, documentary TINYINT, drama TINYINT, fantasy TINYINT, film_noir TINYINT, horror TINYINT, musical TINYINT, mystery TINYINT, romance TINYINT, sci_fi TINYINT, thriller TINYINT, war TINYINT, western TINYINT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items; select * from ml_items ORDER BY id ASC; While, the following create external table with AvroSerDe is not working: DROP TABLE IF EXISTS ml_items_as_avro; CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:${system:test.tmp.dir}/hive-ml-items'; describe ml_items_as_avro; INSERT OVERWRITE TABLE ml_items_as_avro SELECT id, title, imdb_url, unknown_genre, action, adventure, animation, children, comedy, crime, documentary, drama, fantasy, film_noir, horror, musical, mystery, romance, sci_fi, thriller, war, western FROM ml_items; ml_items_as_avro is not created with expected schema, as shown in the "describe ml_items_as_avro" output. The output is below: PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro PREHOOK: type: DROPTABLE POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro POSTHOOK: type: DROPTABLE PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' PREHOOK: type: CREATETABLE POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' POSTHOOK: type: CREATETABLE POSTHOOK: Output: default@ml_items_as_avro PREHOOK: query: describe ml_items_as_avro PREHOOK: type: DESCTABLE POSTHOOK: query: describe ml_items_as_avro POSTHOOK: type: DESCTABLE error_error_error_error_error_error_error string from deserializer cannot_determine_schema string from deserializer check string from deserializer schema string from deserializer url string from deserializer and string from deserializer literal string from deserializer FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target table because column number/types are different 'ml_items_as_avro': Table insclause-0 has 7 columns, but query has 22 columns. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira