[
https://issues.apache.org/jira/browse/HIVE-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126803#comment-15126803
]
Swarnim Kulkarni commented on HIVE-6147:
----------------------------------------
{quote}
Avro supports schema evolution that allows data to be written with one schema
and read with another
{quote}
Yup. Definitely agree. However the point I was trying to make is that you would
still need to provide the same exact schema that was used when writing the
data. Let's take an example. Let's say you used Schema S1 to write a billion
rows to HBase. The Schema then evolved to S2(hopefully in a compatible way) and
you write another billion rows with it. The Schema evolves again to S3 and then
you write another billion rows. Now to be able to read all this data, this is
what you would need to do.
1st billion rows:
Writer Schema: S1
Reader Schema: S3
2nd billion rows:
Writer Schema: S2
Reader Schema: S3
3rd billion rows:
Writer Schema: S3
Reader Schema: S3
So as you see, you are still providing the *exact same version* of the schema
that was used to write the data to be able to read it back successfully.
Without it, it would be extremely hard for avro for make out head and tail of
our data. You "might" still get lucky and be able to deserialize the 1st
billion rows using S3 as reader/writer schema but there are absolutely no
guarantees whatsoever. Which is why you would still need a way regardless to
track what schema was used to write the persist the data when you read it back
and the current design of hive/hbase avro support closely follows that pattern.
> Support avro data stored in HBase columns
> -----------------------------------------
>
> Key: HIVE-6147
> URL: https://issues.apache.org/jira/browse/HIVE-6147
> Project: Hive
> Issue Type: Improvement
> Components: HBase Handler
> Affects Versions: 0.12.0, 0.13.0
> Reporter: Swarnim Kulkarni
> Assignee: Swarnim Kulkarni
> Labels: TODOC14
> Fix For: 0.14.0
>
> Attachments: HIVE-6147.1.patch.txt, HIVE-6147.2.patch.txt,
> HIVE-6147.3.patch.txt, HIVE-6147.3.patch.txt, HIVE-6147.4.patch.txt,
> HIVE-6147.5.patch.txt, HIVE-6147.6.patch.txt
>
>
> Presently, the HBase Hive integration supports querying only primitive data
> types in columns. It would be nice to be able to store and query Avro objects
> in HBase columns by making them visible as structs to Hive. This will allow
> Hive to perform ad hoc analysis of HBase data which can be deeply structured.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)