John Zhuge created SPARK-24940:
----------------------------------
Summary: Coalesce Hint for SQL
Key: SPARK-24940
URL: https://issues.apache.org/jira/browse/SPARK-24940
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 2.1.1
Reporter: John Zhuge
Many Spark SQL users in my company have asked for a way to control the number
of output files in Spark SQL. The users prefer not to use function
repartition(n) or coalesce(n, shuffle) that require them to write and deploy
Scala/Java/Python code.
There are use cases to either reduce or increase the number.
The DataFrame API has repartition/coalesce for a long time. However, we do not
have an equivalent functionality in SQL queries. We propose adding the
following Hive-style Coalesce hint to Spark SQL.
{noformat}
/*+ COALESCE(n, shuffle) */
/*+ REPARTITION(n) */
{noformat}
REPARTITION(n) is equal to COALESCE(n, shuffle=true).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]