John Zhuge created SPARK-24940: ---------------------------------- Summary: Coalesce Hint for SQL Key: SPARK-24940 URL: https://issues.apache.org/jira/browse/SPARK-24940 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.1.1 Reporter: John Zhuge
Many Spark SQL users in my company have asked for a way to control the number of output files in Spark SQL. The users prefer not to use function repartition(n) or coalesce(n, shuffle) that require them to write and deploy Scala/Java/Python code. There are use cases to either reduce or increase the number. The DataFrame API has repartition/coalesce for a long time. However, we do not have an equivalent functionality in SQL queries. We propose adding the following Hive-style Coalesce hint to Spark SQL. {noformat} /*+ COALESCE(n, shuffle) */ /*+ REPARTITION(n) */ {noformat} REPARTITION(n) is equal to COALESCE(n, shuffle=true). -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org