It's for the data source. For example, Spark's built-in Parquet reader/writer is faster than the Hive serde Parquet reader/writer.
On Thu, Apr 25, 2024 at 9:55 PM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > I see a statement made as below and I quote > > "The proposal of SPARK-46122 is to switch the default value of this > configuration from `true` to `false` to use Spark native tables because > we support better." > > Can you please elaborate on the above specifically with regard to the > phrase ".. because > we support better." > > Are you referring to the performance of Spark catalog (I believe it is > internal) or integration with Spark? > > HTH > > Mich Talebzadeh, > Technologist | Architect | Data Engineer | Generative AI | FinCrime > London > United Kingdom > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* The information provided is correct to the best of my > knowledge but of course cannot be guaranteed . It is essential to note > that, as with any advice, quote "one test result is worth one-thousand > expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > > On Thu, 25 Apr 2024 at 11:17, Wenchen Fan <cloud0...@gmail.com> wrote: > >> +1 >> >> On Thu, Apr 25, 2024 at 2:46 PM Kent Yao <y...@apache.org> wrote: >> >>> +1 >>> >>> Nit: the umbrella ticket is SPARK-44111, not SPARK-44444. >>> >>> Thanks, >>> Kent Yao >>> >>> Dongjoon Hyun <dongjoon.h...@gmail.com> 于2024年4月25日周四 14:39写道: >>> > >>> > Hi, All. >>> > >>> > It's great to see community activities to polish 4.0.0 more and more. >>> > Thank you all. >>> > >>> > I'd like to bring SPARK-46122 (another SQL topic) to you from the >>> subtasks >>> > of SPARK-44444 (Prepare Apache Spark 4.0.0), >>> > >>> > - https://issues.apache.org/jira/browse/SPARK-46122 >>> > Set `spark.sql.legacy.createHiveTableByDefault` to `false` by >>> default >>> > >>> > This legacy configuration is about `CREATE TABLE` SQL syntax without >>> > `USING` and `STORED AS`, which is currently mapped to `Hive` table. >>> > The proposal of SPARK-46122 is to switch the default value of this >>> > configuration from `true` to `false` to use Spark native tables because >>> > we support better. >>> > >>> > In other words, Spark will use the value of `spark.sql.sources.default` >>> > as the table provider instead of `Hive` like the other Spark APIs. Of >>> course, >>> > the users can get all the legacy behavior by setting back to `true`. >>> > >>> > Historically, this behavior change was merged once at Apache Spark >>> 3.0.0 >>> > preparation via SPARK-30098 already, but reverted during the 3.0.0 RC >>> period. >>> > >>> > 2019-12-06: SPARK-30098 Use default datasource as provider for CREATE >>> TABLE >>> > 2020-05-16: SPARK-31707 Revert SPARK-30098 Use default datasource as >>> > provider for CREATE TABLE command >>> > >>> > At Apache Spark 3.1.0, we had another discussion about this and >>> defined it >>> > as one of legacy behavior via this configuration via reused ID, >>> SPARK-30098. >>> > >>> > 2020-12-01: >>> https://lists.apache.org/thread/8c8k1jk61pzlcosz3mxo4rkj5l23r204 >>> > 2020-12-03: SPARK-30098 Add a configuration to use default datasource >>> as >>> > provider for CREATE TABLE command >>> > >>> > Last year, we received two additional requests twice to switch this >>> because >>> > Apache Spark 4.0.0 is a good time to make a decision for the future >>> direction. >>> > >>> > 2023-02-27: SPARK-42603 as an independent idea. >>> > 2023-11-27: SPARK-46122 as a part of Apache Spark 4.0.0 idea >>> > >>> > >>> > WDYT? The technical scope is defined in the following PR which is one >>> line of main >>> > code, one line of migration guide, and a few lines of test code. >>> > >>> > - https://github.com/apache/spark/pull/46207 >>> > >>> > Dongjoon. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>>