[
https://issues.apache.org/jira/browse/SPARK-39602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562358#comment-17562358
]
Jungtaek Lim edited comment on SPARK-39602 at 7/5/22 4:14 AM:
--------------------------------------------------------------
Why not make the number of partitions be configurable by existing technology of
configuration (not only Spark config but also any config util)? You should be
able to produce different config file per environment (dev, stage, prod) which
should achieve this. I don't think Spark can indicate whether end users are
running query on their test purpose or not.
was (Author: kabhwan):
Why not make the number of partitions be configurable by existing technology of
configuration? you should be able to produce different config file per
environment (dev, stage, prod) which should achieve this. I don't think Spark
can indicate whether end users are running query on their test purpose or not.
> Invoking .repartition(100000) in a unit test causes the unit test to take >20
> minutes.
> --------------------------------------------------------------------------------------
>
> Key: SPARK-39602
> URL: https://issues.apache.org/jira/browse/SPARK-39602
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 3.3.0
> Reporter: Tanin Na Nakorn
> Priority: Major
>
> Here's a proof of concept:
> {code}
> val result = spark
> .createDataset(List("test"))
> .rdd
> .repartition(100000)
> .map { _ =>
> "test"
> }
> .collect()
> .toList
>
> println(result)
> {code}
> This code takes a very long time in unit test.
> We aim to test for correctness in unit test... not testing the repartition.
> Is there a way to make it faster? (e.g. disable partition in test)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]