[ https://issues.apache.org/jira/browse/SPARK-24940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Zhuge updated SPARK-24940: ------------------------------- Description: Many Spark SQL users in my company have asked for a way to control the number of output files in Spark SQL. The users prefer not to use function repartition\(n\) or coalesce(n, shuffle) that require them to write and deploy Scala/Java/Python code. There are use cases to either reduce or increase the number. The DataFrame API has repartition/coalesce for a long time. However, we do not have an equivalent functionality in SQL queries. We propose adding the following Hive-style Coalesce hint to Spark SQL. {noformat} /*+ COALESCE(n, shuffle) */ /*+ REPARTITION(n) */ {noformat} REPARTITION\(n\) is equal to COALESCE(n, shuffle=true). was: Many Spark SQL users in my company have asked for a way to control the number of output files in Spark SQL. The users prefer not to use function repartition(n) or coalesce(n, shuffle) that require them to write and deploy Scala/Java/Python code. There are use cases to either reduce or increase the number. The DataFrame API has repartition/coalesce for a long time. However, we do not have an equivalent functionality in SQL queries. We propose adding the following Hive-style Coalesce hint to Spark SQL. {noformat} /*+ COALESCE(n, shuffle) */ /*+ REPARTITION(n) */ {noformat} REPARTITION(n) is equal to COALESCE(n, shuffle=true). > Coalesce Hint for SQL > --------------------- > > Key: SPARK-24940 > URL: https://issues.apache.org/jira/browse/SPARK-24940 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.1.1 > Reporter: John Zhuge > Priority: Major > > Many Spark SQL users in my company have asked for a way to control the number > of output files in Spark SQL. The users prefer not to use function > repartition\(n\) or coalesce(n, shuffle) that require them to write and > deploy Scala/Java/Python code. > > There are use cases to either reduce or increase the number. > > The DataFrame API has repartition/coalesce for a long time. However, we do > not have an equivalent functionality in SQL queries. We propose adding the > following Hive-style Coalesce hint to Spark SQL. > {noformat} > /*+ COALESCE(n, shuffle) */ > /*+ REPARTITION(n) */ > {noformat} > REPARTITION\(n\) is equal to COALESCE(n, shuffle=true). -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org