Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

Mich Talebzadeh Thu, 25 Apr 2024 04:27:55 -0700

I see a statement made as below  and I quote

"The proposal of SPARK-46122 is to switch the default value of this
configuration from `true` to `false` to use Spark native tables because
we support better."


Can you please elaborate on the above specifically with regard to the
phrase ".. because
we support better."

Are you referring to the performance of Spark catalog (I believe it is
internal) or integration with Spark?

HTH

Mich Talebzadeh,
Technologist | Architect | Data Engineer  | Generative AI | FinCrime
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".


On Thu, 25 Apr 2024 at 11:17, Wenchen Fan <[email protected]> wrote:

> +1
>
> On Thu, Apr 25, 2024 at 2:46 PM Kent Yao <[email protected]> wrote:
>
>> +1
>>
>> Nit: the umbrella ticket is SPARK-44111, not SPARK-44444.
>>
>> Thanks,
>> Kent Yao
>>
>> Dongjoon Hyun <[email protected]> 于2024年4月25日周四 14:39写道：
>> >
>> > Hi, All.
>> >
>> > It's great to see community activities to polish 4.0.0 more and more.
>> > Thank you all.
>> >
>> > I'd like to bring SPARK-46122 (another SQL topic) to you from the
>> subtasks
>> > of SPARK-44444 (Prepare Apache Spark 4.0.0),
>> >
>> > - https://issues.apache.org/jira/browse/SPARK-46122
>> >    Set `spark.sql.legacy.createHiveTableByDefault` to `false` by default
>> >
>> > This legacy configuration is about `CREATE TABLE` SQL syntax without
>> > `USING` and `STORED AS`, which is currently mapped to `Hive` table.
>> > The proposal of SPARK-46122 is to switch the default value of this
>> > configuration from `true` to `false` to use Spark native tables because
>> > we support better.
>> >
>> > In other words, Spark will use the value of `spark.sql.sources.default`
>> > as the table provider instead of `Hive` like the other Spark APIs. Of
>> course,
>> > the users can get all the legacy behavior by setting back to `true`.
>> >
>> > Historically, this behavior change was merged once at Apache Spark 3.0.0
>> > preparation via SPARK-30098 already, but reverted during the 3.0.0 RC
>> period.
>> >
>> > 2019-12-06: SPARK-30098 Use default datasource as provider for CREATE
>> TABLE
>> > 2020-05-16: SPARK-31707 Revert SPARK-30098 Use default datasource as
>> >             provider for CREATE TABLE command
>> >
>> > At Apache Spark 3.1.0, we had another discussion about this and defined
>> it
>> > as one of legacy behavior via this configuration via reused ID,
>> SPARK-30098.
>> >
>> > 2020-12-01:
>> https://lists.apache.org/thread/8c8k1jk61pzlcosz3mxo4rkj5l23r204
>> > 2020-12-03: SPARK-30098 Add a configuration to use default datasource as
>> >             provider for CREATE TABLE command
>> >
>> > Last year, we received two additional requests twice to switch this
>> because
>> > Apache Spark 4.0.0 is a good time to make a decision for the future
>> direction.
>> >
>> > 2023-02-27: SPARK-42603 as an independent idea.
>> > 2023-11-27: SPARK-46122 as a part of Apache Spark 4.0.0 idea
>> >
>> >
>> > WDYT? The technical scope is defined in the following PR which is one
>> line of main
>> > code, one line of migration guide, and a few lines of test code.
>> >
>> > - https://github.com/apache/spark/pull/46207
>> >
>> > Dongjoon.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: [email protected]
>>
>>

Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

Reply via email to