[
https://issues.apache.org/jira/browse/HIVE-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638965#comment-14638965
]
Swarnim Kulkarni commented on HIVE-6147:
----------------------------------------
[~brocknoland] Apologies for getting back to you so late. Some how I completely
missed the notification on this.
Using this support is pretty straight forward. An example query looks like this:
{noformat}
CREATE EXTERNAL TABLE test_hbase_avro
ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" =
":key,test_col_fam:test_col","test_col_fam.test_col.serialization.type"="avro",test_col_fam.test_col.avro.schema.url="hdfs://testcluster/tmp/schema.avsc")
TBLPROPERTIES ("hbase.table.name" = "hbase_avro_table",
"hbase.struct.autogenerate"="true");
{noformat}
So basically this looks exactly a query that we would use to query an hbase
table. The only difference here being:
{noformat}
"test_col_fam.test_col.serialization.type"="avro"
{noformat}
Using this property, we are telling hive that the given column under the given
column family is an avro column, so we need to deserialize it accordingly.
{noformat}
test_col_fam.test_col.avro.schema.url="hdfs://testcluster/tmp/schema.avsc"
{noformat}
Using this property you specify where is the reader schema for the column that
will be used to deserialize. This can be on HDFS like mentioned here, or
provided inline using something like
"test_col_fam.test_col.avro.schema.literal" property. If you have a custom
store where you store this schema, you can write a custom implementation of
AvroSchemaRetriever[1] and plug that in using the "avro.schema.retriever
property" using a property like "test_col_fam.test_col.avro.schema.retriever".
Ofcourse you would need to ensure that the jar having this custom class is on
the hive classpath.
{noformat}
"hbase.struct.autogenerate"="true"
{noformat}
Avro schemas can be complicated and deeply nested. So at times manually
creating the columns and types for them is not feasible. Specifying this
property lets hive auto deduce the columns and types using the schema that was
provided.
Please do let me know if there are any more questions that I can help out with.
[1]
https://github.com/cloudera/hive/blob/cdh5.3.2-release/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSchemaRetriever.java
> Support avro data stored in HBase columns
> -----------------------------------------
>
> Key: HIVE-6147
> URL: https://issues.apache.org/jira/browse/HIVE-6147
> Project: Hive
> Issue Type: Improvement
> Components: HBase Handler
> Affects Versions: 0.12.0, 0.13.0
> Reporter: Swarnim Kulkarni
> Assignee: Swarnim Kulkarni
> Labels: TODOC14
> Fix For: 0.14.0
>
> Attachments: HIVE-6147.1.patch.txt, HIVE-6147.2.patch.txt,
> HIVE-6147.3.patch.txt, HIVE-6147.3.patch.txt, HIVE-6147.4.patch.txt,
> HIVE-6147.5.patch.txt, HIVE-6147.6.patch.txt
>
>
> Presently, the HBase Hive integration supports querying only primitive data
> types in columns. It would be nice to be able to store and query Avro objects
> in HBase columns by making them visible as structs to Hive. This will allow
> Hive to perform ad hoc analysis of HBase data which can be deeply structured.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)