[
https://issues.apache.org/jira/browse/SPARK-39603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17562327#comment-17562327
]
Hyukjin Kwon commented on SPARK-39603:
--------------------------------------
Mind showing the reproducer? It's very difficult to assess the problem with
just text here.
> Dataset planning in a unit test takes a very long time to finish (e.g. >8mins
> for complex job)
> ----------------------------------------------------------------------------------------------
>
> Key: SPARK-39603
> URL: https://issues.apache.org/jira/browse/SPARK-39603
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 3.3.0
> Reporter: Tanin Na Nakorn
> Priority: Major
>
> At Stripe, we have a very complex data job. The unit test was running fine
> when we used RDD.
> After we switched to Dataset, the unit test takes considerably longer (e.g. >
> 8 mins just for planning).
> Most of our unit tests only process 1-2 records.
> We have tried to investigate it a bit, and we are somewhat sure it's the
> planning phrase.
> We tried disabling almost all optimizers except the ~10 optimizers that can't
> be disabled. It doesn't impact the test run time at all.
> Is there a way to make dataset plan faster in unit test.
> Thank you!
> (Please excuse us. I may use inaccurate term.)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]