Hi devs, I’d like to start a discussion about Supporting Hive on DatasourceV2. We’re now working on a project using DataSourceV2 to provide multiple source support and it works with the data lake solution very well, yet it does not yet support HiveTable.
There are 3 reasons why we need to support Hive on DataSourceV2. 1. Hive itself is one of Spark data sources. 2. HiveTable is essentially a FileTable with its own input and output formats, it works fine with FileTable. 3. HiveTable should be stateless, and users can freely read or write Hive using batch or microbatch. We implemented stateless Hive on DataSourceV1, it supports user to write into Hive on streaming or batch and it has widely used in our company. Recently, we are trying to support Hive on DataSourceV2, Multiple Hive Catalog and DDL Commands have already been supported. Looking forward to more discussions on this. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org