Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

Mich Talebzadeh Thu, 25 Apr 2024 14:08:07 -0700

ok thanks got it

Mich Talebzadeh,
Technologist | Architect | Data Engineer  | Generative AI | FinCrime
London
United Kingdom



   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".


On Thu, 25 Apr 2024 at 15:07, Wenchen Fan <cloud0...@gmail.com> wrote:

> It's for the data source. For example, Spark's built-in Parquet
> reader/writer is faster than the Hive serde Parquet reader/writer.
>
> On Thu, Apr 25, 2024 at 9:55 PM Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> I see a statement made as below  and I quote
>>
>> "The proposal of SPARK-46122 is to switch the default value of this
>> configuration from `true` to `false` to use Spark native tables because
>> we support better."
>>
>> Can you please elaborate on the above specifically with regard to the
>> phrase ".. because
>> we support better."
>>
>> Are you referring to the performance of Spark catalog (I believe it is
>> internal) or integration with Spark?
>>
>> HTH
>>
>> Mich Talebzadeh,
>> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
>> London
>> United Kingdom
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* The information provided is correct to the best of my
>> knowledge but of course cannot be guaranteed . It is essential to note
>> that, as with any advice, quote "one test result is worth one-thousand
>> expert opinions (Werner
>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>
>>
>> On Thu, 25 Apr 2024 at 11:17, Wenchen Fan <cloud0...@gmail.com> wrote:
>>
>>> +1
>>>
>>> On Thu, Apr 25, 2024 at 2:46 PM Kent Yao <y...@apache.org> wrote:
>>>
>>>> +1
>>>>
>>>> Nit: the umbrella ticket is SPARK-44111, not SPARK-44444.
>>>>
>>>> Thanks,
>>>> Kent Yao
>>>>
>>>> Dongjoon Hyun <dongjoon.h...@gmail.com> 于2024年4月25日周四 14:39写道：
>>>> >
>>>> > Hi, All.
>>>> >
>>>> > It's great to see community activities to polish 4.0.0 more and more.
>>>> > Thank you all.
>>>> >
>>>> > I'd like to bring SPARK-46122 (another SQL topic) to you from the
>>>> subtasks
>>>> > of SPARK-44444 (Prepare Apache Spark 4.0.0),
>>>> >
>>>> > - https://issues.apache.org/jira/browse/SPARK-46122
>>>> >    Set `spark.sql.legacy.createHiveTableByDefault` to `false` by
>>>> default
>>>> >
>>>> > This legacy configuration is about `CREATE TABLE` SQL syntax without
>>>> > `USING` and `STORED AS`, which is currently mapped to `Hive` table.
>>>> > The proposal of SPARK-46122 is to switch the default value of this
>>>> > configuration from `true` to `false` to use Spark native tables
>>>> because
>>>> > we support better.
>>>> >
>>>> > In other words, Spark will use the value of
>>>> `spark.sql.sources.default`
>>>> > as the table provider instead of `Hive` like the other Spark APIs. Of
>>>> course,
>>>> > the users can get all the legacy behavior by setting back to `true`.
>>>> >
>>>> > Historically, this behavior change was merged once at Apache Spark
>>>> 3.0.0
>>>> > preparation via SPARK-30098 already, but reverted during the 3.0.0 RC
>>>> period.
>>>> >
>>>> > 2019-12-06: SPARK-30098 Use default datasource as provider for CREATE
>>>> TABLE
>>>> > 2020-05-16: SPARK-31707 Revert SPARK-30098 Use default datasource as
>>>> >             provider for CREATE TABLE command
>>>> >
>>>> > At Apache Spark 3.1.0, we had another discussion about this and
>>>> defined it
>>>> > as one of legacy behavior via this configuration via reused ID,
>>>> SPARK-30098.
>>>> >
>>>> > 2020-12-01:
>>>> https://lists.apache.org/thread/8c8k1jk61pzlcosz3mxo4rkj5l23r204
>>>> > 2020-12-03: SPARK-30098 Add a configuration to use default datasource
>>>> as
>>>> >             provider for CREATE TABLE command
>>>> >
>>>> > Last year, we received two additional requests twice to switch this
>>>> because
>>>> > Apache Spark 4.0.0 is a good time to make a decision for the future
>>>> direction.
>>>> >
>>>> > 2023-02-27: SPARK-42603 as an independent idea.
>>>> > 2023-11-27: SPARK-46122 as a part of Apache Spark 4.0.0 idea
>>>> >
>>>> >
>>>> > WDYT? The technical scope is defined in the following PR which is one
>>>> line of main
>>>> > code, one line of migration guide, and a few lines of test code.
>>>> >
>>>> > - https://github.com/apache/spark/pull/46207
>>>> >
>>>> > Dongjoon.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>
>>>>

Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

Reply via email to