[ 
https://issues.apache.org/jira/browse/HIVE-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125117#comment-15125117
 ] 

Swarnim Kulkarni commented on HIVE-6147:
----------------------------------------

{noformat}
 it tries to retrieve the write schema from data (ws = 
retrieveSchemaFromBytes(data)) even if the schema URL (reader schema) had been 
provided
{noformat}

Correct. That is the default behavior. The writer schema defaults to the reader 
schema if one has not been provided. If it has been(like you are doing in your 
case), it would use the reader schema from the given URL but still default to 
the writer schema from the data. If you want to provide the writer schema as 
well, I would recommend you take a look into the AvroSchemaRetriever[1]. You 
can provide a custom implementation of it and provide both reader and write 
schema from any custom source that you would like. A test implementation can be 
found here for reference[2] and the corresponding test that uses this 
implementation here[3]. Once done, simply plug it in with 
"avro.schema.retriever" property. One caveat is that this will currently apply 
to the whole table and not individual columns. So it makes the assumption that 
there is a uniform schema across the table.

Hope this helps. Let me know if there are any additional questions.

[1] 
https://github.com/apache/hive/blob/release-1.2.1/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSchemaRetriever.java
[2] 
https://github.com/apache/hive/blob/release-1.2.1/hbase-handler/src/test/org/apache/hadoop/hive/hbase/HBaseTestAvroSchemaRetriever.java
[3] 
https://github.com/apache/hive/blob/release-1.2.1/hbase-handler/src/test/org/apache/hadoop/hive/hbase/TestHBaseSerDe.java#L1293-L1344

> Support avro data stored in HBase columns
> -----------------------------------------
>
>                 Key: HIVE-6147
>                 URL: https://issues.apache.org/jira/browse/HIVE-6147
>             Project: Hive
>          Issue Type: Improvement
>          Components: HBase Handler
>    Affects Versions: 0.12.0, 0.13.0
>            Reporter: Swarnim Kulkarni
>            Assignee: Swarnim Kulkarni
>              Labels: TODOC14
>             Fix For: 0.14.0
>
>         Attachments: HIVE-6147.1.patch.txt, HIVE-6147.2.patch.txt, 
> HIVE-6147.3.patch.txt, HIVE-6147.3.patch.txt, HIVE-6147.4.patch.txt, 
> HIVE-6147.5.patch.txt, HIVE-6147.6.patch.txt
>
>
> Presently, the HBase Hive integration supports querying only primitive data 
> types in columns. It would be nice to be able to store and query Avro objects 
> in HBase columns by making them visible as structs to Hive. This will allow 
> Hive to perform ad hoc analysis of HBase data which can be deeply structured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to