Re: [VOTE][DISCUSS] A Spark SQL command or Call procedure

Paul Lam Wed, 25 Sep 2024 20:21:28 -0700

+1 for option 2.

The syntax is more popular and familiar to most users.


Best,
Paul Lam

> 2024年9月25日 17:37，gabrywu <[email protected]> 写道：
> 
> +1 for option 2.
> I'm excited to know about SPARK-48781, and it's perfect if spark supports a
> stored procedure. I will use the `call ` syntax, which only looks like a
> `stored procedure`,in this PR and adapt it to stored procedures in the
> future.
> 
> XiDuo You <[email protected]> 于2024年9月25日周三 13:12写道：
> 
>> +1 for option 2
>> thank you
>> 
>> Fei Wang <[email protected]> 于2024年9月25日周三 11:53写道：
>>> 
>>> Prefer option 2 as well.
>>> 
>>> BTW, it is necessary to support compact single partition for partitioned
>> table.
>>> 
>>> On 2024/09/24 07:19:27 Cheng Pan wrote:
>>>> Hi Gabry, thanks for bringing up this discussion, usually, when we
>> want to discuss some idea and make decision, instead of starting a thread
>> with both [DISCUSS] and [VOTE], we firstly start a [DISCUSS] thread with
>> all options collected, and during the discussion, pros and cons of each
>> options will be listed and compared, ideally, all those involved in the
>> discussion will reach a consensus eventually, if not, we choose the most
>> supported options as the candidate to start a [VOTE], with
>>>> 
>>>> +1 adopt
>>>> +0 does not care
>>>> -1 reject because …
>>>> 
>>>> Back to the topic itself, there are actually 3 options:
>>>> 
>>>> Option 1: new syntax COMPACT TABLE <table_name> [INTO <target_size >]
>> [CLEANUP | RETAIN | LIST]
>>>> Option 2: CALL compact_table(args …)
>>>> Option 3: VACUUM <table_name> [OTHER ARGS]
>>>> 
>>>> I prefer option 2, then 3. Given Delta and Iceberg's dominance in the
>> lakehouse market, I suggest following either Delta's VACCUM or Iceberg's
>> CALL syntax. Plus Kyuubi Spark extension already adopted Delta ZORDER
>> syntax, and Spark 4.0 adopted the Iceberg CALL syntax, see SPARK-48781.
>>>> 
>>>> Thanks,
>>>> Cheng Pan
>>>> 
>>>> 
>>>> 
>>>>> On Sep 19, 2024, at 19:02, gabrywu <[email protected]> wrote:
>>>>> 
>>>>> Hi, folks,
>>>>> I'm creating a PR #6695 <https://github.com/apache/kyuubi/pull/6695>
>> to create a new extended Spark SQL command to merge small files. And a few
>> of PMCs and committers propose that it's better to create a new Call
>> Procedure instead.
>>>>> So, I'm posting an email to vote on which one should be the best way
>> to extend Spark SQL. No matter what's the result, we can consider it as a
>> final decision to create a new spark extension in the upcoming PRs
>>>>> 
>>>>> The VOTE will remain open for at least 2 weeks [ ] +1 Spark SQL
>> Command [ ] +0 Both is OK [ ] -1  Spark Call Procedure
>>>>> 
>>>> 
>>>> 
>>

Re: [VOTE][DISCUSS] A Spark SQL command or Call procedure

Reply via email to