[ 
https://issues.apache.org/jira/browse/SPARK-39603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562327#comment-17562327
 ] 

Hyukjin Kwon commented on SPARK-39603:
--------------------------------------

Mind showing the reproducer? It's very difficult to assess the problem with 
just text here.

> Dataset planning in a unit test takes a very long time to finish (e.g. >8mins 
> for complex job)
> ----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-39603
>                 URL: https://issues.apache.org/jira/browse/SPARK-39603
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.3.0
>            Reporter: Tanin Na Nakorn
>            Priority: Major
>
> At Stripe, we have a very complex data job. The unit test was running fine 
> when we used RDD.
> After we switched to Dataset, the unit test takes considerably longer (e.g. > 
> 8 mins just for planning).
> Most of our unit tests only process 1-2 records.
> We have tried to investigate it a bit, and we are somewhat sure it's the 
> planning phrase.
> We tried disabling almost all optimizers except the ~10 optimizers that can't 
> be disabled. It doesn't impact the test run time at all.
> Is there a way to make dataset plan faster in unit test.
> Thank you!
> (Please excuse us. I may use inaccurate term.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to