[
https://issues.apache.org/jira/browse/SPARK-35365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17342354#comment-17342354
]
Yuming Wang commented on SPARK-35365:
-------------------------------------
[~xiaohua] Could you check which rule affect the performance, for example:
{noformat}
=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 3022
Total time: 7.941302436 seconds
Rule
Effective Time / Total Time Effective Runs / Total Runs
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations
3350202022 / 3357847817 7 / 39
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions
11946476 / 588567543 6 / 39
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences
516175887 / 577794974 15 / 39
org.apache.spark.sql.catalyst.analysis.TimeWindowing
0 / 519817133 0 / 39
org.apache.spark.sql.catalyst.analysis.DecimalPrecision
226306881 / 271650752 11 / 39
org.apache.spark.sql.catalyst.optimizer.EliminateOuterJoin
9838775 / 202214973 1 / 6
org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings
141138907 / 188596520 3 / 39
org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion
107365436 / 185270852 3 / 39
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts
58358334 / 140943690 3 / 39
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator
0 / 119236169 0 / 39
org.apache.spark.sql.catalyst.optimizer.ColumnPruning
41291489 / 76464261 2 / 8
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder
0 / 64775042 0 / 39
org.apache.spark.sql.catalyst.analysis.AlignViewOutput
0 / 61796761 0 / 39
org.apache.spark.sql.catalyst.analysis.TypeCoercion$BooleanEquality
0 / 58143331 0 / 39
{noformat}
> spark3.1.1 use too long to analyze table fields
> -----------------------------------------------
>
> Key: SPARK-35365
> URL: https://issues.apache.org/jira/browse/SPARK-35365
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.1.1
> Reporter: yao
> Priority: Major
>
> I have a big sql with a few width tables join and complex logic, when I run
> that in spark 2.4 , it will take 20 minues in analyze phase, when I use spark
> 3.1.1, it will use about 40 minutes,
> I need set spark.sql.analyzer.maxIterations=1000 in spark3.1.1.
> or spark.sql.optimizer.maxIterations=1000 in spark2.4.
> no other special setting for this .
> I check on the spark ui , I find that there is no job generated, all executor
> have no active tasks, and when I set log level to debug, I find that the job
> is in analyze phase, analyze the fields reference.
> this phase use too long time.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]