Zhenxiao Luo created HIVE-3442:
----------------------------------

             Summary: AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not 
working when creating external table
                 Key: HIVE-3442
                 URL: https://issues.apache.org/jira/browse/HIVE-3442
             Project: Hive
          Issue Type: Bug
    Affects Versions: 0.10.0
            Reporter: Zhenxiao Luo
            Assignee: Zhenxiao Luo
             Fix For: 0.10.0


After creating a table and load data into it, I could check that the table is 
created successfully, and data is inside:

DROP TABLE IF EXISTS ml_items;

CREATE TABLE ml_items(id INT,
                      title STRING,
                      release_date STRING,
                      video_release_date STRING,
                      imdb_url STRING,
                      unknown_genre TINYINT,
                      action TINYINT,
                      adventure TINYINT,
                      animation TINYINT,
                      children TINYINT,
                      comedy TINYINT,
                      crime TINYINT,
                      documentary TINYINT,
                      drama TINYINT,
                      fantasy TINYINT,
                      film_noir TINYINT,
                      horror TINYINT,
                      musical TINYINT,
                      mystery TINYINT,
                      romance TINYINT,
                      sci_fi TINYINT,
                      thriller TINYINT,
                      war TINYINT,
                      western TINYINT)
  ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
  STORED AS TEXTFILE;
LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;

select * from ml_items ORDER BY id ASC;

While, the following create external table with AvroSerDe is not working:

DROP TABLE IF EXISTS ml_items_as_avro;
CREATE EXTERNAL TABLE ml_items_as_avro
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  WITH SERDEPROPERTIES (
    'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
  STORED as INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';

describe ml_items_as_avro;

INSERT OVERWRITE TABLE ml_items_as_avro
  SELECT id, title,
    imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
crime,
    documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
    sci_fi, thriller, war, western
  FROM ml_items;

ml_items_as_avro is not created with expected schema, as shown in the "describe 
ml_items_as_avro" output. The output is below:


PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
PREHOOK: type: DROPTABLE
POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
POSTHOOK: type: DROPTABLE
PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  WITH SERDEPROPERTIES (
    'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
  STORED as INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
PREHOOK: type: CREATETABLE
POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  WITH SERDEPROPERTIES (
    'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
  STORED as INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
POSTHOOK: type: CREATETABLE
POSTHOOK: Output: default@ml_items_as_avro
PREHOOK: query: describe ml_items_as_avro
PREHOOK: type: DESCTABLE
POSTHOOK: query: describe ml_items_as_avro
POSTHOOK: type: DESCTABLE
error_error_error_error_error_error_error   string  from deserializer
cannot_determine_schema string  from deserializer
check   string  from deserializer
schema  string  from deserializer
url string  from deserializer
and string  from deserializer
literal string  from deserializer
FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target 
table because column number/types are different 'ml_items_as_avro': Table 
insclause-0 has 7 columns, but query has 22 columns.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to