[jira] [Commented] (HIVE-6147) Support avro data stored in HBase columns

Swarnim Kulkarni (JIRA) Mon, 01 Feb 2016 11:00:09 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126803#comment-15126803
 ]


Swarnim Kulkarni commented on HIVE-6147:
----------------------------------------

{quote}
Avro supports schema evolution that allows data to be written with one schema 
and read with another
{quote}

Yup. Definitely agree. However the point I was trying to make is that you would 
still need to provide the same exact schema that was used when writing the 
data. Let's take an example. Let's say you used Schema S1 to write a billion 
rows to HBase. The Schema then evolved to S2(hopefully in a compatible way) and 
you write another billion rows with it. The Schema evolves again to S3 and then 
you write another billion rows. Now to be able to read all this data, this is 
what you would need to do.

1st billion rows:

Writer Schema: S1
Reader Schema: S3

2nd billion rows:

Writer Schema: S2
Reader Schema: S3

3rd billion rows:

Writer Schema: S3
Reader Schema: S3

So as you see, you are still providing the *exact same version* of the schema 
that was used to write the data to be able to read it back successfully. 
Without it, it would be extremely hard for avro for make out head and tail of 
our data. You "might" still get lucky and be able to deserialize the 1st 
billion rows using S3 as reader/writer schema but there are absolutely no 
guarantees whatsoever. Which is why you would still need a way regardless to 
track what schema was used to write the persist the data when you read it back 
and the current design of hive/hbase avro support closely follows that pattern.

> Support avro data stored in HBase columns
> -----------------------------------------
>
>                 Key: HIVE-6147
>                 URL: https://issues.apache.org/jira/browse/HIVE-6147
>             Project: Hive
>          Issue Type: Improvement
>          Components: HBase Handler
>    Affects Versions: 0.12.0, 0.13.0
>            Reporter: Swarnim Kulkarni
>            Assignee: Swarnim Kulkarni
>              Labels: TODOC14
>             Fix For: 0.14.0
>
>         Attachments: HIVE-6147.1.patch.txt, HIVE-6147.2.patch.txt, 
> HIVE-6147.3.patch.txt, HIVE-6147.3.patch.txt, HIVE-6147.4.patch.txt, 
> HIVE-6147.5.patch.txt, HIVE-6147.6.patch.txt
>
>
> Presently, the HBase Hive integration supports querying only primitive data 
> types in columns. It would be nice to be able to store and query Avro objects 
> in HBase columns by making them visible as structs to Hive. This will allow 
> Hive to perform ad hoc analysis of HBase data which can be deeply structured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6147) Support avro data stored in HBase columns

Reply via email to