Tom Snee created HIVE-9312:
------------------------------

             Summary: Literal string "\n" confuses Avro SerDe
                 Key: HIVE-9312
                 URL: https://issues.apache.org/jira/browse/HIVE-9312
             Project: Hive
          Issue Type: Bug
          Components: Serializers/Deserializers
    Affects Versions: 0.13.0
         Environment: Hortonworks Data Platform 2.1.2.1 on Centos 6.5
            Reporter: Tom Snee


Avro files with string fields that contain a backslash followed by 'n' confuse 
the Avro SerDe.

Steps to recreate:
1. Put attached schema nested.avsc into HDFS under /user/someone.
2. Convert attached JSON file example.json into Avro with avro-tools, like so: 
"java -jar avro-tools-1.7.7.jar fromjson --schema-file nested.avsc example.json 
> example.avro"
3. Put example.avro into HDFS under /user/someone/avro-files.
4. Create a Hive table with this statement:
CREATE EXTERNAL TABLE avro_table
    ROW FORMAT SERDE
    'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
    STORED AS INPUTFORMAT
    'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
    OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
    LOCATION
    '/user/someone/avro-files/'
    TBLPROPERTIES (
        'avro.schema.url'='hdfs:///user/someone/nested.avsc'
    );
5. Observe that "select * from avro_table;" returns one row, as expected.
6. Observe that "select * from avro_table where 
mastersubjectnumber='A12B3CDE-FGH4-5I67-89J0-KLMN1OPQ23R4';" returns 13 garbled 
rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to