[jira] [Created] (SPARK-47717) Support Hive tables as a streaming source and sink

Adi Suresh (Jira) Wed, 03 Apr 2024 08:09:51 -0700

Adi Suresh created SPARK-47717:
----------------------------------

             Summary: Support Hive tables as a streaming source and sink
                 Key: SPARK-47717
                 URL: https://issues.apache.org/jira/browse/SPARK-47717
             Project: Spark
          Issue Type: New Feature
          Components: SQL
    Affects Versions: 3.5.1, 3.4.1, 3.3.2
            Reporter: Adi Suresh
             Fix For: 3.5.1, 3.4.1, 3.3.2



People have data stored in Hive tables. Currently these tables do not support 
Spark streaming, so customers do not have a good way to natively stream this 
data in Spark. The current solutions involve an intermediary to track which 
data has been read and periodically execute batch jobs. This use case should be 
supported by Spark's in-built streaming mechanism.

 

>From doing some research, Hive supports streaming 
>[https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest+V2] 
>but Spark does not support streaming on tables in Hive format. I don't think 
>it makes sense to start copying Hive server-side code into Spark, but we could 
>copy the relevant logic and wrap it in the DataSourceV2 APIs to enable this 
>feature. To not break backwards compatibility, we would probably want to gate 
>this behind a new Spark property.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-47717) Support Hive tables as a streaming source and sink

Reply via email to