Re: Plans for built-in v2 data sources in Spark 4

Dongjoon Hyun Wed, 20 Sep 2023 13:14:28 -0700

Instead of that, I believe you are looking for
`spark.sql.sources.useV1SourceList` if the question is about "Concretely,
is the plan for Spark 4 to continue defaulting to the built-in v1 data
sources?".


Here is the code.

https://github.com/apache/spark/blob/324a07b534ac8c2e83a50ac5ea4c5d93fd57b790/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L3148-L3155

Dongjoon.



On Wed, Sep 20, 2023 at 5:47 AM Will Raschkowski <[email protected]>
wrote:

> Thank you for linking that, Dongjoon!
>
>
>
> I found SPARK-44518 <https://issues.apache.org/jira/browse/SPARK-44518> in
> that list which wants to turn Spark’s Hive integration into a data source.
> IIUC, that’s very related but I’m curious if I’m thinking about this
> correctly:
>
>
>
> Big gaps between built-in v1 and v2 data sources are support for bucketing
> and partitioning. And the reason v1 data sources support those is because
> the v1 paths are kind of interleaved with Spark’s Hive integration. I
> understand separating that Hive integration or making it more data
> source-ish would put us closer to supporting bucketing and partitioning in
> v2 and then defaulting to v2.
>
>
>
> *From: *Dongjoon Hyun <[email protected]>
> *Date: *Friday, 15 September 2023 at 05:36
> *To: *Will Raschkowski <[email protected]>
> *Cc: *[email protected] <[email protected]>
> *Subject: *Re: Plans for built-in v2 data sources in Spark 4
>
> *CAUTION:* This email originates from an external party (outside of
> Palantir). If you believe this message is suspicious in nature, please use
> the "Report Message" button built into Outlook.
>
>
>
> Hi, Will.
>
> According to the following JIRA, as of now, there is no plan or on-going
> discussion to switch it.
>
> https://issues.apache.org/jira/browse/SPARK-44111 [issues.apache.org]
> <https://urldefense.com/v3/__https:/issues.apache.org/jira/browse/SPARK-44111__;!!NkS9JGVQ2sDq!9ClB4HvwYAfMI2IMJf1zw4UPYwDUxsnN21c3p35XbY8OQO8vCZnS-KtrRL52X6vfCnXAqFpB_jh0S5q-m5htQQyNwA4$>
> (Prepare Apache Spark 4.0.0)
>
> Thanks,
> Dongjoon.
>
>
>
>
>
> On Wed, Sep 13, 2023 at 9:02 AM Will Raschkowski
> <[email protected]> wrote:
>
> Hey everyone,
>
>
>
> I was wondering what the plans are for Spark's built-in v2 file data
> sources in Spark 4.
>
>
>
> Concretely, is the plan for Spark 4 to continue defaulting to the built-in
> v1 data sources? And if yes, what are the blockers for defaulting to v2? I
> see, just as example, that writing Hive-partitions is not supported in v2.
> Are there other blockers or outstanding discussions?
>
>
>
> Regards,
>
> Will
>
>
>
>

Re: Plans for built-in v2 data sources in Spark 4

Reply via email to