Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

Dongjoon Hyun Mon, 29 Apr 2024 09:33:02 -0700

It's a surprise to me to see that someone has different positions
in a very short period of time in the community.


Mitch casted +1 for SPARK-44444 and -1 for SPARK-46122.
- https://lists.apache.org/thread/4cbkpvc3vr3b6k0wp6lgsw37spdpnqrc
- https://lists.apache.org/thread/x09gynt90v3hh5sql1gt9dlcn6m6699p

To Mitch, what I'm interested in is the following specifically.
> 2. Compatibility: Changing the default behavior could potentially
>  break existing workflows or pipelines that rely on the current behavior.

May I ask you the following questions?
A. What is the purpose of the migration guide in the ASF projects?

B. Do you claim that there is incompatibility when you have
     spark.sql.legacy.createHiveTableByDefault=true which is described
     in the migration guide?

C. Do you know that ANSI SQL has new RUNTIME exceptions
     which are harder than SPARK-46122?

D. Or, did you cast +1 for SPARK-44444 because
     you think there is no breaking change by default?

I guess there is some misunderstanding on the proposal.

Thanks,
Dongjoon.


On Fri, Apr 26, 2024 at 12:05 PM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Hi,
>
> I would like to add a side note regarding the discussion process and the
> current title of the proposal. The title '[DISCUSS] SPARK-46122: Set
> spark.sql.legacy.createHiveTableByDefault to false' focuses on a specific
> configuration parameter, which might lead some participants to overlook its
> broader implications (as was raised by myself and others). I believe that a
> more descriptive title, encompassing the broader discussion on default
> behaviours for creating Hive tables in Spark SQL, could enable greater
> engagement within the community. This is an important topic that deserves
> thorough consideration.
>
> HTH
>
> Mich Talebzadeh,
> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
> London
> United Kingdom
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
> Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>
>
> On Fri, 26 Apr 2024 at 07:13, L. C. Hsieh <vii...@gmail.com> wrote:
>
>> +1
>>
>> On Thu, Apr 25, 2024 at 8:16 PM Yuming Wang <yumw...@apache.org> wrote:
>>
>>> +1
>>>
>>> On Fri, Apr 26, 2024 at 8:25 AM Nimrod Ofek <ofek.nim...@gmail.com>
>>> wrote:
>>>
>>>> Of course, I can't think of a scenario of thousands of tables with
>>>> single in memory Spark cluster with in memory catalog.
>>>> Thanks for the help!
>>>>
>>>> בתאריך יום ה׳, 25 באפר׳ 2024, 23:56, מאת Mich Talebzadeh ‏<
>>>> mich.talebza...@gmail.com>:
>>>>
>>>>>
>>>>>
>>>>> Agreed. In scenarios where most of the interactions with the catalog
>>>>> are related to query planning, saving and metadata management, the choice
>>>>> of catalog implementation may have less impact on query runtime 
>>>>> performance.
>>>>> This is because the time spent on metadata operations is generally
>>>>> minimal compared to the time spent on actual data fetching, processing, 
>>>>> and
>>>>> computation.
>>>>> However, if we consider scalability and reliability concerns,
>>>>> especially as the size and complexity of data and query workload grow.
>>>>> While an in-memory catalog may offer excellent performance for smaller
>>>>> workloads,
>>>>> it will face limitations in handling larger-scale deployments with
>>>>> thousands of tables, partitions, and users. Additionally, durability and
>>>>> persistence are crucial considerations, particularly in production
>>>>> environments where data integrity
>>>>> and availability are crucial. In-memory catalog implementations may
>>>>> lack durability, meaning that metadata changes could be lost in the event
>>>>> of a system failure or restart. Therefore, while in-memory catalog
>>>>> implementations can provide speed and efficiency for certain use cases, we
>>>>> ought to consider the requirements for scalability, reliability, and data
>>>>> durability when choosing a catalog solution for production deployments. In
>>>>> many cases, a combination of in-memory and disk-based catalog solutions 
>>>>> may
>>>>> offer the best balance of performance and resilience for demanding large
>>>>> scale workloads.
>>>>>
>>>>>
>>>>> HTH
>>>>>
>>>>>
>>>>> Mich Talebzadeh,
>>>>> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
>>>>> London
>>>>> United Kingdom
>>>>>
>>>>>
>>>>>    view my Linkedin profile
>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>
>>>>>
>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> *Disclaimer:* The information provided is correct to the best of my
>>>>> knowledge but of course cannot be guaranteed . It is essential to note
>>>>> that, as with any advice, quote "one test result is worth one-thousand
>>>>> expert opinions (Werner
>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>>>
>>>>>
>>>>> On Thu, 25 Apr 2024 at 16:32, Nimrod Ofek <ofek.nim...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Of course, but it's in memory and not persisted which is much faster,
>>>>>> and as I said- I believe that most of the interaction with it is during 
>>>>>> the
>>>>>> planning and save and not actual query run operations, and they are short
>>>>>> and minimal compared to data fetching and manipulation so I don't believe
>>>>>> it will have big impact on query run...
>>>>>>
>>>>>> בתאריך יום ה׳, 25 באפר׳ 2024, 17:52, מאת Mich Talebzadeh ‏<
>>>>>> mich.talebza...@gmail.com>:
>>>>>>
>>>>>>> Well, I will be surprised because Derby database is single threaded
>>>>>>> and won't be much of a use here.
>>>>>>>
>>>>>>> Most Hive metastore in the commercial world utilise postgres or
>>>>>>> Oracle for metastore that are battle proven, replicated and backed up.
>>>>>>>
>>>>>>> Mich Talebzadeh,
>>>>>>> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
>>>>>>> London
>>>>>>> United Kingdom
>>>>>>>
>>>>>>>
>>>>>>>    view my Linkedin profile
>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>
>>>>>>>
>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Disclaimer:* The information provided is correct to the best of my
>>>>>>> knowledge but of course cannot be guaranteed . It is essential to note
>>>>>>> that, as with any advice, quote "one test result is worth one-thousand
>>>>>>> expert opinions (Werner
>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>>>>>
>>>>>>>
>>>>>>> On Thu, 25 Apr 2024 at 15:39, Nimrod Ofek <ofek.nim...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Yes, in memory hive catalog backed by local Derby DB.
>>>>>>>> And again, I presume that most metadata related parts are during
>>>>>>>> planning and not actual run, so I don't see why it should strongly 
>>>>>>>> affect
>>>>>>>> query performance.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>>
>>>>>>>> בתאריך יום ה׳, 25 באפר׳ 2024, 17:29, מאת Mich Talebzadeh ‏<
>>>>>>>> mich.talebza...@gmail.com>:
>>>>>>>>
>>>>>>>>> With regard to your point below
>>>>>>>>>
>>>>>>>>> "The thing I'm missing is this: let's say that the output format I
>>>>>>>>> choose is delta lake or iceberg or whatever format that uses parquet. 
>>>>>>>>> Where
>>>>>>>>> does the catalog implementation (which holds metadata afaik, same 
>>>>>>>>> metadata
>>>>>>>>> that iceberg and delta lake save for their tables about their columns)
>>>>>>>>> comes into play and why should it affect performance? "
>>>>>>>>>
>>>>>>>>> The catalog implementation comes into play regardless of the
>>>>>>>>> output format chosen (Delta Lake, Iceberg, Parquet, etc.) because it 
>>>>>>>>> is
>>>>>>>>> responsible for managing metadata about the datasets, tables, 
>>>>>>>>> schemas, and
>>>>>>>>> other objects stored in aforementioned formats. Even though Delta 
>>>>>>>>> Lake and
>>>>>>>>> Iceberg have their metadata management mechanisms internally, they 
>>>>>>>>> still
>>>>>>>>> rely on the catalog for providing a unified interface for accessing 
>>>>>>>>> and
>>>>>>>>> manipulating metadata across different storage formats.
>>>>>>>>>
>>>>>>>>> "Another thing is that if I understand correctly, and I might be
>>>>>>>>> totally wrong here, the internal spark catalog is a local 
>>>>>>>>> installation of
>>>>>>>>> hive metastore anyway, so I'm not sure what the catalog has to do with
>>>>>>>>> anything"
>>>>>>>>>
>>>>>>>>> .I don't understand this. Do you mean a Derby database?
>>>>>>>>>
>>>>>>>>> HTH
>>>>>>>>>
>>>>>>>>> Mich Talebzadeh,
>>>>>>>>> Technologist | Architect | Data Engineer  | Generative AI |
>>>>>>>>> FinCrime
>>>>>>>>> London
>>>>>>>>> United Kingdom
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    view my Linkedin profile
>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *Disclaimer:* The information provided is correct to the best of
>>>>>>>>> my knowledge but of course cannot be guaranteed . It is essential to 
>>>>>>>>> note
>>>>>>>>> that, as with any advice, quote "one test result is worth one-thousand
>>>>>>>>> expert opinions (Werner
>>>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, 25 Apr 2024 at 14:38, Nimrod Ofek <ofek.nim...@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks for the detailed answer.
>>>>>>>>>> The thing I'm missing is this: let's say that the output format I
>>>>>>>>>> choose is delta lake or iceberg or whatever format that uses 
>>>>>>>>>> parquet. Where
>>>>>>>>>> does the catalog implementation (which holds metadata afaik, same 
>>>>>>>>>> metadata
>>>>>>>>>> that iceberg and delta lake save for their tables about their 
>>>>>>>>>> columns)
>>>>>>>>>> comes into play and why should it affect performance?
>>>>>>>>>> Another thing is that if I understand correctly, and I might be
>>>>>>>>>> totally wrong here, the internal spark catalog is a local 
>>>>>>>>>> installation of
>>>>>>>>>> hive metastore anyway, so I'm not sure what the catalog has to do 
>>>>>>>>>> with
>>>>>>>>>> anything.
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> בתאריך יום ה׳, 25 באפר׳ 2024, 16:14, מאת Mich Talebzadeh ‏<
>>>>>>>>>> mich.talebza...@gmail.com>:
>>>>>>>>>>
>>>>>>>>>>> My take regarding your question is that your mileage varies so
>>>>>>>>>>> to speak.
>>>>>>>>>>>
>>>>>>>>>>> 1) Hive provides a more mature and widely adopted catalog
>>>>>>>>>>> solution that integrates well with other components in the Hadoop
>>>>>>>>>>> ecosystem, such as HDFS, HBase, and YARN. IIf you are Hadoop 
>>>>>>>>>>> centric S(say
>>>>>>>>>>> on-premise), using Hive may offer better compatibility and
>>>>>>>>>>> interoperability.
>>>>>>>>>>> 2) Hive provides a SQL-like interface that is familiar to users
>>>>>>>>>>> who are accustomed to traditional RDBMs. If your use case involves 
>>>>>>>>>>> complex
>>>>>>>>>>> SQL queries or existing SQL-based workflows, using Hive may be 
>>>>>>>>>>> advantageous.
>>>>>>>>>>> 3) If you are looking for performance, spark's native catalog
>>>>>>>>>>> tends to offer better performance for certain workloads, 
>>>>>>>>>>> particularly those
>>>>>>>>>>> that involve iterative processing or complex data 
>>>>>>>>>>> transformations.(my
>>>>>>>>>>> understanding). Spark's in-memory processing capabilities and 
>>>>>>>>>>> optimizations
>>>>>>>>>>> make it well-suited for interactive analytics and machine learning
>>>>>>>>>>> tasks.(my favourite)
>>>>>>>>>>> 4) Integration with Spark Workflows: If you primarily use Spark
>>>>>>>>>>> for data processing and analytics, using Spark's native catalog may
>>>>>>>>>>> simplify workflow management and reduce overhead, Spark's  tight
>>>>>>>>>>> integration with its catalog allows for seamless interaction with 
>>>>>>>>>>> Spark
>>>>>>>>>>> applications and libraries.
>>>>>>>>>>> 5) There seems to be some similarity with spark catalog and
>>>>>>>>>>> Databricks unity catalog, so that may favour the choice.
>>>>>>>>>>>
>>>>>>>>>>> HTH
>>>>>>>>>>>
>>>>>>>>>>> Mich Talebzadeh,
>>>>>>>>>>> Technologist | Architect | Data Engineer  | Generative AI |
>>>>>>>>>>> FinCrime
>>>>>>>>>>> London
>>>>>>>>>>> United Kingdom
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    view my Linkedin profile
>>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> *Disclaimer:* The information provided is correct to the best
>>>>>>>>>>> of my knowledge but of course cannot be guaranteed . It is 
>>>>>>>>>>> essential to
>>>>>>>>>>> note that, as with any advice, quote "one test result is worth
>>>>>>>>>>> one-thousand expert opinions (Werner
>>>>>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>>>>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, 25 Apr 2024 at 12:30, Nimrod Ofek <ofek.nim...@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I will also appreciate some material that describes the
>>>>>>>>>>>> differences between Spark native tables vs hive tables and why 
>>>>>>>>>>>> each should
>>>>>>>>>>>> be used...
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Nimrod
>>>>>>>>>>>>
>>>>>>>>>>>> בתאריך יום ה׳, 25 באפר׳ 2024, 14:27, מאת Mich Talebzadeh ‏<
>>>>>>>>>>>> mich.talebza...@gmail.com>:
>>>>>>>>>>>>
>>>>>>>>>>>>> I see a statement made as below  and I quote
>>>>>>>>>>>>>
>>>>>>>>>>>>> "The proposal of SPARK-46122 is to switch the default value of
>>>>>>>>>>>>> this
>>>>>>>>>>>>> configuration from `true` to `false` to use Spark native
>>>>>>>>>>>>> tables because
>>>>>>>>>>>>> we support better."
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you please elaborate on the above specifically with regard
>>>>>>>>>>>>> to the phrase ".. because
>>>>>>>>>>>>> we support better."
>>>>>>>>>>>>>
>>>>>>>>>>>>> Are you referring to the performance of Spark catalog (I
>>>>>>>>>>>>> believe it is internal) or integration with Spark?
>>>>>>>>>>>>>
>>>>>>>>>>>>> HTH
>>>>>>>>>>>>>
>>>>>>>>>>>>> Mich Talebzadeh,
>>>>>>>>>>>>> Technologist | Architect | Data Engineer  | Generative AI |
>>>>>>>>>>>>> FinCrime
>>>>>>>>>>>>> London
>>>>>>>>>>>>> United Kingdom
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>    view my Linkedin profile
>>>>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Disclaimer:* The information provided is correct to the best
>>>>>>>>>>>>> of my knowledge but of course cannot be guaranteed . It is 
>>>>>>>>>>>>> essential to
>>>>>>>>>>>>> note that, as with any advice, quote "one test result is
>>>>>>>>>>>>> worth one-thousand expert opinions (Werner
>>>>>>>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von Braun
>>>>>>>>>>>>> <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, 25 Apr 2024 at 11:17, Wenchen Fan <cloud0...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Apr 25, 2024 at 2:46 PM Kent Yao <y...@apache.org>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Nit: the umbrella ticket is SPARK-44111, not SPARK-44444.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Kent Yao
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Dongjoon Hyun <dongjoon.h...@gmail.com> 于2024年4月25日周四
>>>>>>>>>>>>>>> 14:39写道：
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Hi, All.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > It's great to see community activities to polish 4.0.0
>>>>>>>>>>>>>>> more and more.
>>>>>>>>>>>>>>> > Thank you all.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > I'd like to bring SPARK-46122 (another SQL topic) to you
>>>>>>>>>>>>>>> from the subtasks
>>>>>>>>>>>>>>> > of SPARK-44444 (Prepare Apache Spark 4.0.0),
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > - https://issues.apache.org/jira/browse/SPARK-46122
>>>>>>>>>>>>>>> >    Set `spark.sql.legacy.createHiveTableByDefault` to
>>>>>>>>>>>>>>> `false` by default
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > This legacy configuration is about `CREATE TABLE` SQL
>>>>>>>>>>>>>>> syntax without
>>>>>>>>>>>>>>> > `USING` and `STORED AS`, which is currently mapped to
>>>>>>>>>>>>>>> `Hive` table.
>>>>>>>>>>>>>>> > The proposal of SPARK-46122 is to switch the default value
>>>>>>>>>>>>>>> of this
>>>>>>>>>>>>>>> > configuration from `true` to `false` to use Spark native
>>>>>>>>>>>>>>> tables because
>>>>>>>>>>>>>>> > we support better.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > In other words, Spark will use the value of
>>>>>>>>>>>>>>> `spark.sql.sources.default`
>>>>>>>>>>>>>>> > as the table provider instead of `Hive` like the other
>>>>>>>>>>>>>>> Spark APIs. Of course,
>>>>>>>>>>>>>>> > the users can get all the legacy behavior by setting back
>>>>>>>>>>>>>>> to `true`.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Historically, this behavior change was merged once at
>>>>>>>>>>>>>>> Apache Spark 3.0.0
>>>>>>>>>>>>>>> > preparation via SPARK-30098 already, but reverted during
>>>>>>>>>>>>>>> the 3.0.0 RC period.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > 2019-12-06: SPARK-30098 Use default datasource as provider
>>>>>>>>>>>>>>> for CREATE TABLE
>>>>>>>>>>>>>>> > 2020-05-16: SPARK-31707 Revert SPARK-30098 Use default
>>>>>>>>>>>>>>> datasource as
>>>>>>>>>>>>>>> >             provider for CREATE TABLE command
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > At Apache Spark 3.1.0, we had another discussion about
>>>>>>>>>>>>>>> this and defined it
>>>>>>>>>>>>>>> > as one of legacy behavior via this configuration via
>>>>>>>>>>>>>>> reused ID, SPARK-30098.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > 2020-12-01:
>>>>>>>>>>>>>>> https://lists.apache.org/thread/8c8k1jk61pzlcosz3mxo4rkj5l23r204
>>>>>>>>>>>>>>> > 2020-12-03: SPARK-30098 Add a configuration to use default
>>>>>>>>>>>>>>> datasource as
>>>>>>>>>>>>>>> >             provider for CREATE TABLE command
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Last year, we received two additional requests twice to
>>>>>>>>>>>>>>> switch this because
>>>>>>>>>>>>>>> > Apache Spark 4.0.0 is a good time to make a decision for
>>>>>>>>>>>>>>> the future direction.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > 2023-02-27: SPARK-42603 as an independent idea.
>>>>>>>>>>>>>>> > 2023-11-27: SPARK-46122 as a part of Apache Spark 4.0.0
>>>>>>>>>>>>>>> idea
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > WDYT? The technical scope is defined in the following PR
>>>>>>>>>>>>>>> which is one line of main
>>>>>>>>>>>>>>> > code, one line of migration guide, and a few lines of test
>>>>>>>>>>>>>>> code.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > - https://github.com/apache/spark/pull/46207
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Dongjoon.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>

Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

Reply via email to