Hi all,
Many Spark users in my company are asking for a way to control the number
of output files in Spark SQL. There are use cases to either reduce or
increase the number. The users prefer not to use function *repartition*(n)
or *coalesce*(n, shuffle) that require them to write and deploy
Scala/Java/Python code.
Could we introduce a query hint for this purpose (similar to Broadcast Join
Hints)?
/*+ *COALESCE*(n, shuffle) */
In general, is query hint is the best way to bring DF functionality to SQL
without extending SQL syntax? Any suggestion is highly appreciated.
This requirement is not the same as SPARK-6221 that asked for auto-merging
output files.
Thanks,
John Zhuge