Re: [Spark Core]: Adding support for size based partition coalescing

mhawes Fri, 21 May 2021 10:12:11 -0700

Adding /another/ update to say that I'm currently planning on using a
recently introduced feature whereby calling `.repartition()` with no args
will cause the dataset to be optimised by AQE. This actually suits our
use-case perfectly!


Example:

        sparkSession.conf().set("spark.sql.adaptive.enabled", "true");
        Dataset<Long> dataset = sparkSession.range(1, 4, 1,
4).repartition();

        assertThat(dataset.rdd().collectPartitions().length).isEqualTo(1);
// true


Relevant PRs/Issues:
[SPARK-31220][SQL] repartition obeys initialPartitionNum when
adaptiveExecutionEnabled https://github.com/apache/spark/pull/27986
[SPARK-32056][SQL] Coalesce partitions for repartition by expressions when
AQE is enabled https://github.com/apache/spark/pull/28900
[SPARK-32056][SQL][Follow-up] Coalesce partitions for repartiotion hint and
sql when AQE is enabled https://github.com/apache/spark/pull/28952



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Re: [Spark Core]: Adding support for size based partition coalescing

Reply via email to