ok thanks got it Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom
view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". On Thu, 25 Apr 2024 at 15:07, Wenchen Fan <cloud0...@gmail.com> wrote: > It's for the data source. For example, Spark's built-in Parquet > reader/writer is faster than the Hive serde Parquet reader/writer. > > On Thu, Apr 25, 2024 at 9:55 PM Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> I see a statement made as below and I quote >> >> "The proposal of SPARK-46122 is to switch the default value of this >> configuration from `true` to `false` to use Spark native tables because >> we support better." >> >> Can you please elaborate on the above specifically with regard to the >> phrase ".. because >> we support better." >> >> Are you referring to the performance of Spark catalog (I believe it is >> internal) or integration with Spark? >> >> HTH >> >> Mich Talebzadeh, >> Technologist | Architect | Data Engineer | Generative AI | FinCrime >> London >> United Kingdom >> >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> https://en.everybodywiki.com/Mich_Talebzadeh >> >> >> >> *Disclaimer:* The information provided is correct to the best of my >> knowledge but of course cannot be guaranteed . It is essential to note >> that, as with any advice, quote "one test result is worth one-thousand >> expert opinions (Werner >> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun >> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". >> >> >> On Thu, 25 Apr 2024 at 11:17, Wenchen Fan <cloud0...@gmail.com> wrote: >> >>> +1 >>> >>> On Thu, Apr 25, 2024 at 2:46 PM Kent Yao <y...@apache.org> wrote: >>> >>>> +1 >>>> >>>> Nit: the umbrella ticket is SPARK-44111, not SPARK-44444. >>>> >>>> Thanks, >>>> Kent Yao >>>> >>>> Dongjoon Hyun <dongjoon.h...@gmail.com> 于2024年4月25日周四 14:39写道: >>>> > >>>> > Hi, All. >>>> > >>>> > It's great to see community activities to polish 4.0.0 more and more. >>>> > Thank you all. >>>> > >>>> > I'd like to bring SPARK-46122 (another SQL topic) to you from the >>>> subtasks >>>> > of SPARK-44444 (Prepare Apache Spark 4.0.0), >>>> > >>>> > - https://issues.apache.org/jira/browse/SPARK-46122 >>>> > Set `spark.sql.legacy.createHiveTableByDefault` to `false` by >>>> default >>>> > >>>> > This legacy configuration is about `CREATE TABLE` SQL syntax without >>>> > `USING` and `STORED AS`, which is currently mapped to `Hive` table. >>>> > The proposal of SPARK-46122 is to switch the default value of this >>>> > configuration from `true` to `false` to use Spark native tables >>>> because >>>> > we support better. >>>> > >>>> > In other words, Spark will use the value of >>>> `spark.sql.sources.default` >>>> > as the table provider instead of `Hive` like the other Spark APIs. Of >>>> course, >>>> > the users can get all the legacy behavior by setting back to `true`. >>>> > >>>> > Historically, this behavior change was merged once at Apache Spark >>>> 3.0.0 >>>> > preparation via SPARK-30098 already, but reverted during the 3.0.0 RC >>>> period. >>>> > >>>> > 2019-12-06: SPARK-30098 Use default datasource as provider for CREATE >>>> TABLE >>>> > 2020-05-16: SPARK-31707 Revert SPARK-30098 Use default datasource as >>>> > provider for CREATE TABLE command >>>> > >>>> > At Apache Spark 3.1.0, we had another discussion about this and >>>> defined it >>>> > as one of legacy behavior via this configuration via reused ID, >>>> SPARK-30098. >>>> > >>>> > 2020-12-01: >>>> https://lists.apache.org/thread/8c8k1jk61pzlcosz3mxo4rkj5l23r204 >>>> > 2020-12-03: SPARK-30098 Add a configuration to use default datasource >>>> as >>>> > provider for CREATE TABLE command >>>> > >>>> > Last year, we received two additional requests twice to switch this >>>> because >>>> > Apache Spark 4.0.0 is a good time to make a decision for the future >>>> direction. >>>> > >>>> > 2023-02-27: SPARK-42603 as an independent idea. >>>> > 2023-11-27: SPARK-46122 as a part of Apache Spark 4.0.0 idea >>>> > >>>> > >>>> > WDYT? The technical scope is defined in the following PR which is one >>>> line of main >>>> > code, one line of migration guide, and a few lines of test code. >>>> > >>>> > - https://github.com/apache/spark/pull/46207 >>>> > >>>> > Dongjoon. >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>> >>>>