[jira] [Comment Edited] (SPARK-39602) Invoking .repartition(100000) in a unit test causes the unit test to take >20 minutes.

Jungtaek Lim (Jira) Mon, 04 Jul 2022 21:15:06 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-39602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562358#comment-17562358
 ]


Jungtaek Lim edited comment on SPARK-39602 at 7/5/22 4:14 AM:
--------------------------------------------------------------

Why not make the number of partitions be configurable by existing technology of 
configuration (not only Spark config but also any config util)? You should be 
able to produce different config file per environment (dev, stage, prod) which 
should achieve this. I don't think Spark can indicate whether end users are 
running query on their test purpose or not.


was (Author: kabhwan):
Why not make the number of partitions be configurable by existing technology of 
configuration? you should be able to produce different config file per 
environment (dev, stage, prod) which should achieve this. I don't think Spark 
can indicate whether end users are running query on their test purpose or not.

> Invoking .repartition(100000) in a unit test causes the unit test to take >20 
> minutes.
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-39602
>                 URL: https://issues.apache.org/jira/browse/SPARK-39602
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.3.0
>            Reporter: Tanin Na Nakorn
>            Priority: Major
>
> Here's a proof of concept: 
> {code}
> val result = spark
>       .createDataset(List("test"))
>       .rdd
>       .repartition(100000)
>       .map { _ =>
>         "test"
>       }
>       .collect()
>       .toList
>  
>     println(result)
> {code}
> This code takes a very long time in unit test.
> We aim to test for correctness in unit test... not testing the repartition. 
> Is there a way to make it faster? (e.g. disable partition in test)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SPARK-39602) Invoking .repartition(100000) in a unit test causes the unit test to take >20 minutes.

Reply via email to