[
https://issues.apache.org/jira/browse/SPARK-13129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127742#comment-15127742
]
Tao Li commented on SPARK-13129:
--------------------------------
[~srowen] OK, I will give some background.
There are two ways to load data to hive:
1. batch: generate an ORC file and add a partition to hive metastore
2. stream: through Hive HCatalog Streaming API, we can streamly ingest data to
hive
About he stream ingest
https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest
http://orc.apache.org/docs/acid.html
My problem is that, when I create a hive table and ingest data to it using the
second stream way, here is my hive hdfs partition directory structure, and
bucket_00001 is the ORC file. And the hive metastore store the transaction
information.
/user/hive/warehouse/litao.db/demo/logdate=201602021250/_orc_acid_version
/user/hive/warehouse/litao.db/demo/logdate=201602021250/delta_0000004_0000006/bucket_00001
/user/hive/warehouse/litao.db/demo/logdate=201602021250/delta_0000004_0000006/bucket_00001_flush_length
I found hiveql can query the table, but spark sql can't query it.
I think maybe spark sql can't recognize the hive streaming table.
> Spark SQL can't query hive table, which is create by Hive HCatalog Streaming
> API
> ---------------------------------------------------------------------------------
>
> Key: SPARK-13129
> URL: https://issues.apache.org/jira/browse/SPARK-13129
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.0
> Environment: hadoop version: 2.5.0-cdh5.3.2
> hive version: 0.13.1
> spark version: 1.6.0
> Reporter: Tao Li
> Labels: hive, orc, sparksql
>
> I create a Hive table using Hive HCatalog Streaming API.
> https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest
> The hive table is streaming data ingested by flume hive sink. And I can query
> the hive table using hive command line.
> But I can't query the hive table using spark-sql command line. Is it spark
> sql's bug or a unimplemented feature?
> The hive storage file is ORC format with ACID support.
> http://orc.apache.org/docs/acid.html
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]