Thank you for linking that, Dongjoon!

I found SPARK-44518<https://issues.apache.org/jira/browse/SPARK-44518> in that 
list which wants to turn Spark’s Hive integration into a data source. To think 
out loud: The big gaps between built-in v1 and v2 data sources are support for 
bucketing and partitioning. And the reason v1 data sources support those is 
because they’re kind of interleaved with Spark’s Hive integration. Separating 
that Hive integration or making it more data source-ish would put us close to 
supporting bucketing and partitioning in v2 and then defaulting to v2. (Just my 
understanding – curious if I’m thinking about this correctly).

Anyway, thank you for the pointer.

From: Dongjoon Hyun <dongjoon.h...@gmail.com>
Date: Friday, 15 September 2023 at 05:36
To: Will Raschkowski <wraschkow...@palantir.com.invalid>
Cc: dev@spark.apache.org <dev@spark.apache.org>
Subject: Re: Plans for built-in v2 data sources in Spark 4
CAUTION: This email originates from an external party (outside of Palantir). If 
you believe this message is suspicious in nature, please use the "Report 
Message" button built into Outlook.

Hi, Will.

According to the following JIRA, as of now, there is no plan or on-going 
discussion to switch it.

https://issues.apache.org/jira/browse/SPARK-44111 
[issues.apache.org]<https://urldefense.com/v3/__https:/issues.apache.org/jira/browse/SPARK-44111__;!!NkS9JGVQ2sDq!9ClB4HvwYAfMI2IMJf1zw4UPYwDUxsnN21c3p35XbY8OQO8vCZnS-KtrRL52X6vfCnXAqFpB_jh0S5q-m5htQQyNwA4$>
 (Prepare Apache Spark 4.0.0)

Thanks,
Dongjoon.


On Wed, Sep 13, 2023 at 9:02 AM Will Raschkowski 
<wraschkow...@palantir.com.invalid> wrote:
Hey everyone,

I was wondering what the plans are for Spark's built-in v2 file data sources in 
Spark 4.

Concretely, is the plan for Spark 4 to continue defaulting to the built-in v1 
data sources? And if yes, what are the blockers for defaulting to v2? I see, 
just as example, that writing Hive-partitions is not supported in v2. Are there 
other blockers or outstanding discussions?

Regards,
Will

Reply via email to