Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

Wenchen Fan Thu, 25 Apr 2024 07:08:08 -0700

It's for the data source. For example, Spark's built-in Parquet
reader/writer is faster than the Hive serde Parquet reader/writer.


On Thu, Apr 25, 2024 at 9:55 PM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> I see a statement made as below  and I quote
>
> "The proposal of SPARK-46122 is to switch the default value of this
> configuration from `true` to `false` to use Spark native tables because
> we support better."
>
> Can you please elaborate on the above specifically with regard to the
> phrase ".. because
> we support better."
>
> Are you referring to the performance of Spark catalog (I believe it is
> internal) or integration with Spark?
>
> HTH
>
> Mich Talebzadeh,
> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
> London
> United Kingdom
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
> Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>
>
> On Thu, 25 Apr 2024 at 11:17, Wenchen Fan <cloud0...@gmail.com> wrote:
>
>> +1
>>
>> On Thu, Apr 25, 2024 at 2:46 PM Kent Yao <y...@apache.org> wrote:
>>
>>> +1
>>>
>>> Nit: the umbrella ticket is SPARK-44111, not SPARK-44444.
>>>
>>> Thanks,
>>> Kent Yao
>>>
>>> Dongjoon Hyun <dongjoon.h...@gmail.com> 于2024年4月25日周四 14:39写道：
>>> >
>>> > Hi, All.
>>> >
>>> > It's great to see community activities to polish 4.0.0 more and more.
>>> > Thank you all.
>>> >
>>> > I'd like to bring SPARK-46122 (another SQL topic) to you from the
>>> subtasks
>>> > of SPARK-44444 (Prepare Apache Spark 4.0.0),
>>> >
>>> > - https://issues.apache.org/jira/browse/SPARK-46122
>>> >    Set `spark.sql.legacy.createHiveTableByDefault` to `false` by
>>> default
>>> >
>>> > This legacy configuration is about `CREATE TABLE` SQL syntax without
>>> > `USING` and `STORED AS`, which is currently mapped to `Hive` table.
>>> > The proposal of SPARK-46122 is to switch the default value of this
>>> > configuration from `true` to `false` to use Spark native tables because
>>> > we support better.
>>> >
>>> > In other words, Spark will use the value of `spark.sql.sources.default`
>>> > as the table provider instead of `Hive` like the other Spark APIs. Of
>>> course,
>>> > the users can get all the legacy behavior by setting back to `true`.
>>> >
>>> > Historically, this behavior change was merged once at Apache Spark
>>> 3.0.0
>>> > preparation via SPARK-30098 already, but reverted during the 3.0.0 RC
>>> period.
>>> >
>>> > 2019-12-06: SPARK-30098 Use default datasource as provider for CREATE
>>> TABLE
>>> > 2020-05-16: SPARK-31707 Revert SPARK-30098 Use default datasource as
>>> >             provider for CREATE TABLE command
>>> >
>>> > At Apache Spark 3.1.0, we had another discussion about this and
>>> defined it
>>> > as one of legacy behavior via this configuration via reused ID,
>>> SPARK-30098.
>>> >
>>> > 2020-12-01:
>>> https://lists.apache.org/thread/8c8k1jk61pzlcosz3mxo4rkj5l23r204
>>> > 2020-12-03: SPARK-30098 Add a configuration to use default datasource
>>> as
>>> >             provider for CREATE TABLE command
>>> >
>>> > Last year, we received two additional requests twice to switch this
>>> because
>>> > Apache Spark 4.0.0 is a good time to make a decision for the future
>>> direction.
>>> >
>>> > 2023-02-27: SPARK-42603 as an independent idea.
>>> > 2023-11-27: SPARK-46122 as a part of Apache Spark 4.0.0 idea
>>> >
>>> >
>>> > WDYT? The technical scope is defined in the following PR which is one
>>> line of main
>>> > code, one line of migration guide, and a few lines of test code.
>>> >
>>> > - https://github.com/apache/spark/pull/46207
>>> >
>>> > Dongjoon.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>

Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

Reply via email to