[DISCUSS] Supporting hive on DataSourceV2

JackyLee Mon, 23 Mar 2020 04:20:46 -0700

Hi devs,
I’d like to start a discussion about Supporting Hive on DatasourceV2. We’re
now working on a project using DataSourceV2 to provide multiple source
support and it works with the data lake solution very well, yet it does not
yet support HiveTable.


There are 3 reasons why we need to support Hive on DataSourceV2.
1. Hive itself is one of Spark data sources.
2. HiveTable is essentially a FileTable with its own input and output
formats, it works fine with FileTable.
3. HiveTable should be stateless, and users can freely read or write Hive
using batch or microbatch.

We implemented stateless Hive on DataSourceV1, it supports user to write
into Hive on streaming or batch and it has widely used in our company.
Recently, we are trying to support Hive on DataSourceV2, Multiple Hive
Catalog and DDL Commands have already been supported. 

Looking forward to more discussions on this.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

[DISCUSS] Supporting hive on DataSourceV2

Reply via email to