Re: Plans for built-in v2 data sources in Spark 4

Will Raschkowski Wed, 20 Sep 2023 06:51:09 -0700

Thank you for linking that, Dongjoon!

I found SPARK-44518<https://issues.apache.org/jira/browse/SPARK-44518> in that 
list which wants to turn Spark’s Hive integration into a data source. To think 
out loud: The big gaps between built-in v1 and v2 data sources are support for 
bucketing and partitioning. And the reason v1 data sources support those is 
because they’re kind of interleaved with Spark’s Hive integration. Separating 
that Hive integration or making it more data source-ish would put us close to 
supporting bucketing and partitioning in v2 and then defaulting to v2. (Just my 
understanding – curious if I’m thinking about this correctly).

Anyway, thank you for the pointer.

From: Dongjoon Hyun <[email protected]>
Date: Friday, 15 September 2023 at 05:36
To: Will Raschkowski <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: Plans for built-in v2 data sources in Spark 4
CAUTION: This email originates from an external party (outside of Palantir). If 
you believe this message is suspicious in nature, please use the "Report 
Message" button built into Outlook.

Hi, Will.

According to the following JIRA, as of now, there is no plan or on-going 
discussion to switch it.

https://issues.apache.org/jira/browse/SPARK-44111 
[issues.apache.org]<https://urldefense.com/v3/__https:/issues.apache.org/jira/browse/SPARK-44111__;!!NkS9JGVQ2sDq!9ClB4HvwYAfMI2IMJf1zw4UPYwDUxsnN21c3p35XbY8OQO8vCZnS-KtrRL52X6vfCnXAqFpB_jh0S5q-m5htQQyNwA4$>
 (Prepare Apache Spark 4.0.0)

Thanks,
Dongjoon.

On Wed, Sep 13, 2023 at 9:02 AM Will Raschkowski 
<[email protected]> wrote:
Hey everyone,

I was wondering what the plans are for Spark's built-in v2 file data sources in 
Spark 4.

Concretely, is the plan for Spark 4 to continue defaulting to the built-in v1 
data sources? And if yes, what are the blockers for defaulting to v2? I see, 
just as example, that writing Hive-partitions is not supported in v2. Are there 
other blockers or outstanding discussions?

Regards,
Will

Re: Plans for built-in v2 data sources in Spark 4

Reply via email to