Hi Gabry, thanks for bringing up this discussion, usually, when we want to 
discuss some idea and make decision, instead of starting a thread with both 
[DISCUSS] and [VOTE], we firstly start a [DISCUSS] thread with all options 
collected, and during the discussion, pros and cons of each options will be 
listed and compared, ideally, all those involved in the discussion will reach a 
consensus eventually, if not, we choose the most supported options as the 
candidate to start a [VOTE], with 

+1 adopt
+0 does not care
-1 reject because …

Back to the topic itself, there are actually 3 options:

Option 1: new syntax COMPACT TABLE <table_name> [INTO <target_size >] [CLEANUP 
| RETAIN | LIST]
Option 2: CALL compact_table(args …)
Option 3: VACUUM <table_name> [OTHER ARGS]

I prefer option 2, then 3. Given Delta and Iceberg's dominance in the lakehouse 
market, I suggest following either Delta's VACCUM or Iceberg's CALL syntax. 
Plus Kyuubi Spark extension already adopted Delta ZORDER syntax, and Spark 4.0 
adopted the Iceberg CALL syntax, see SPARK-48781.

Thanks,
Cheng Pan



> On Sep 19, 2024, at 19:02, gabrywu <gabr...@apache.org> wrote:
> 
> Hi, folks, 
> I'm creating a PR #6695 <https://github.com/apache/kyuubi/pull/6695> to 
> create a new extended Spark SQL command to merge small files. And a few of 
> PMCs and committers propose that it's better to create a new Call Procedure 
> instead. 
> So, I'm posting an email to vote on which one should be the best way to 
> extend Spark SQL. No matter what's the result, we can consider it as a final 
> decision to create a new spark extension in the upcoming PRs
> 
> The VOTE will remain open for at least 2 weeks [ ] +1 Spark SQL Command [ ] 
> +0 Both is OK [ ] -1  Spark Call Procedure
> 

Reply via email to