[jira] [Updated] (SPARK-32932) AQE local shuffle reader breaks repartitioning for dynamic partition overwrite

Manu Zhang (Jira) Thu, 17 Sep 2020 23:03:57 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-32932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Manu Zhang updated SPARK-32932:
-------------------------------
    Description: 
With AQE, local shuffle reader breaks users' repartitioning for dynamic 
partition overwrite as in the following case.
{code:java}
test("repartition with local reader") {
  withSQLConf(SQLConf.PARTITION_OVERWRITE_MODE.key -> 
PartitionOverwriteMode.DYNAMIC.toString,
    SQLConf.SHUFFLE_PARTITIONS.key -> "5",
    SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true") {
    withTable("t") {
      val data = for (
        i <- 1 to 10;
        j <- 1 to 3
      ) yield (i, j)
      data.toDF("a", "b")
        .repartition($"b")
        .write
        .partitionBy("b")
        .mode("overwrite")
        .saveAsTable("t")
      assert(spark.read.table("t").inputFiles.length == 3)
    }
  }
}{code}
Coalescing shuffle partitions could also break it.

  was:
With AQE, local reader optimizer breaks users' repartitioning for dynamic 
partition overwrite as in the following case.
{code:java}
test("repartition with local reader") {
  withSQLConf(SQLConf.PARTITION_OVERWRITE_MODE.key -> 
PartitionOverwriteMode.DYNAMIC.toString,
    SQLConf.SHUFFLE_PARTITIONS.key -> "5",
    SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true") {
    withTable("t") {
      val data = for (
        i <- 1 to 10;
        j <- 1 to 3
      ) yield (i, j)
      data.toDF("a", "b")
        .repartition($"b")
        .write
        .partitionBy("b")
        .mode("overwrite")
        .saveAsTable("t")
      assert(spark.read.table("t").inputFiles.length == 3)
    }
  }
}{code}
Coalescing shuffle partitions could also break it.


> AQE local shuffle reader breaks repartitioning for dynamic partition overwrite
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-32932
>                 URL: https://issues.apache.org/jira/browse/SPARK-32932
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Manu Zhang
>            Priority: Minor
>
> With AQE, local shuffle reader breaks users' repartitioning for dynamic 
> partition overwrite as in the following case.
> {code:java}
> test("repartition with local reader") {
>   withSQLConf(SQLConf.PARTITION_OVERWRITE_MODE.key -> 
> PartitionOverwriteMode.DYNAMIC.toString,
>     SQLConf.SHUFFLE_PARTITIONS.key -> "5",
>     SQLConf.ADAPTIVE_EXECUTION_ENABLED.key -> "true") {
>     withTable("t") {
>       val data = for (
>         i <- 1 to 10;
>         j <- 1 to 3
>       ) yield (i, j)
>       data.toDF("a", "b")
>         .repartition($"b")
>         .write
>         .partitionBy("b")
>         .mode("overwrite")
>         .saveAsTable("t")
>       assert(spark.read.table("t").inputFiles.length == 3)
>     }
>   }
> }{code}
> Coalescing shuffle partitions could also break it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-32932) AQE local shuffle reader breaks repartitioning for dynamic partition overwrite

Reply via email to