[
https://issues.apache.org/jira/browse/HIVE-6147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123690#comment-15123690
]
Ilya Kats commented on HIVE-6147:
---------------------------------
I'm trying to create a table in Hive 0.14 that points to an HBase table with
one column family ("c") and one column ("b") that contains schema-less avro
serialized object:
{code:sql}
CREATE EXTERNAL TABLE customers
ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe'
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = ":key,c:b",
"c.b.serialization.type"="avro",
"c.b.avro.schema.url"="hdfs:/....../Customer.avsc")
TBLPROPERTIES ("hbase.table.name" = "customers",
"hbase.struct.autogenerate"="true",
"hive.serialization.extend.nesting.levels"="true");
{code}
The DDL above creates the table successfully, but queries fail with the
following error:
{code}
Failed with exception
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: Error
evaluating c_b
16/01/29 15:36:55 [main]: ERROR CliDriver: Failed with exception
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: Error
evaluating c_b
java.io.IOException: org.apache.hadoop.hive.ql.metadata.HiveException: Error
evaluating c_b
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:152)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1621)
at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:267)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
at
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating
c_b
at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:82)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)
at
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:571)
at
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:563)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138)
... 12 more
Caused by: org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorException: An
error occurred retrieving schema from bytes
at
org.apache.hadoop.hive.serde2.avro.AvroLazyObjectInspector.retrieveSchemaFromBytes(AvroLazyObjectInspector.java:331)
at
org.apache.hadoop.hive.serde2.avro.AvroLazyObjectInspector.deserializeStruct(AvroLazyObjectInspector.java:287)
at
org.apache.hadoop.hive.serde2.avro.AvroLazyObjectInspector.getStructFieldData(AvroLazyObjectInspector.java:142)
at
org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldData(LazySimpleStructObjectInspector.java:109)
at
org.apache.hadoop.hive.serde2.objectinspector.DelegatedStructObjectInspector.getStructFieldData(DelegatedStructObjectInspector.java:88)
at
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94)
at
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
at
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:77)
... 17 more
Caused by: java.io.IOException: Not a data file.
at
org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
at org.apache.avro.file.DataFileStream.<init>(DataFileStream.java:84)
at
org.apache.hadoop.hive.serde2.avro.AvroLazyObjectInspector.retrieveSchemaFromBytes(AvroLazyObjectInspector.java:328)
... 25 more
{code}
It seems that there is a problem in the following code in
AvroLazyObjectInspector:
{code}
...
private Object deserializeStruct(Object struct, String fieldName) {
...
if (readerSchema == null) {
...
} else {
// a reader schema was provided
if (schemaRetriever != null) {
// a schema retriever has been provided as well. Attempt to read the
write schema from the
// retriever
ws = schemaRetriever.retrieveWriterSchema(data);
if (ws == null) {
throw new IllegalStateException(
"Null writer schema retrieved from schemaRetriever for field [" +
fieldName + "]");
}
} else {
// attempt retrieving the schema from the data
ws = retrieveSchemaFromBytes(data);
}
rs = readerSchema;
try {
avroWritable.readFields(data, ws, rs);
} catch (IOException ioe) {
throw new AvroObjectInspectorException("Error deserializing avro
payload", ioe);
}
}
...
}
...
{code}
because it tries to retrieve the write schema from data ({{ws =
retrieveSchemaFromBytes(data)}}) even if the schema URL (reader schema) had
been provided. Is there way to make it work for schema-less avro data?
> Support avro data stored in HBase columns
> -----------------------------------------
>
> Key: HIVE-6147
> URL: https://issues.apache.org/jira/browse/HIVE-6147
> Project: Hive
> Issue Type: Improvement
> Components: HBase Handler
> Affects Versions: 0.12.0, 0.13.0
> Reporter: Swarnim Kulkarni
> Assignee: Swarnim Kulkarni
> Labels: TODOC14
> Fix For: 0.14.0
>
> Attachments: HIVE-6147.1.patch.txt, HIVE-6147.2.patch.txt,
> HIVE-6147.3.patch.txt, HIVE-6147.3.patch.txt, HIVE-6147.4.patch.txt,
> HIVE-6147.5.patch.txt, HIVE-6147.6.patch.txt
>
>
> Presently, the HBase Hive integration supports querying only primitive data
> types in columns. It would be nice to be able to store and query Avro objects
> in HBase columns by making them visible as structs to Hive. This will allow
> Hive to perform ad hoc analysis of HBase data which can be deeply structured.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)