[jira] [Commented] (SPARK-13129) Spark SQL can't query hive table, which is create by Hive HCatalog Streaming API

Tao Li (JIRA) Mon, 01 Feb 2016 22:18:07 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-13129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127742#comment-15127742
 ]


Tao Li commented on SPARK-13129:
--------------------------------

[~srowen] OK, I will give some background.
There are two ways to load data to hive:
1. batch: generate an ORC file and add a partition to hive metastore
2. stream: through Hive HCatalog Streaming API, we can streamly ingest data to 
hive
About he stream ingest
https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest
http://orc.apache.org/docs/acid.html

My problem is that, when I create a hive table and ingest data to it using the 
second stream way, here is my hive hdfs partition directory structure, and 
bucket_00001 is the ORC file. And the hive metastore store the transaction 
information.
/user/hive/warehouse/litao.db/demo/logdate=201602021250/_orc_acid_version
/user/hive/warehouse/litao.db/demo/logdate=201602021250/delta_0000004_0000006/bucket_00001
/user/hive/warehouse/litao.db/demo/logdate=201602021250/delta_0000004_0000006/bucket_00001_flush_length

I found hiveql can query the table, but spark sql can't query it.
I think maybe spark sql can't recognize the hive streaming table.

> Spark SQL can't query hive table, which is create by Hive HCatalog Streaming 
> API 
> ---------------------------------------------------------------------------------
>
>                 Key: SPARK-13129
>                 URL: https://issues.apache.org/jira/browse/SPARK-13129
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.0
>         Environment: hadoop version: 2.5.0-cdh5.3.2
> hive version: 0.13.1
> spark version: 1.6.0
>            Reporter: Tao Li
>              Labels: hive, orc, sparksql
>
> I create a Hive table using Hive HCatalog Streaming API.
> https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest
> The hive table is streaming data ingested by flume hive sink. And I can query 
> the hive table using hive command line.
> But I can't query the hive table using spark-sql command line. Is it spark 
> sql's bug or a unimplemented feature?
> The hive storage file is ORC format with ACID support.
> http://orc.apache.org/docs/acid.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-13129) Spark SQL can't query hive table, which is create by Hive HCatalog Streaming API

Reply via email to