[jira] [Commented] (SPARK-35365) spark3.1.1 use too long to analyze table fields

Yuming Wang (Jira) Tue, 11 May 2021 00:05:10 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-35365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17342354#comment-17342354
 ]


Yuming Wang commented on SPARK-35365:
-------------------------------------

[~xiaohua] Could you check which rule affect the performance, for example:
{noformat}
=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 3022
Total time: 7.941302436 seconds

Rule                                                                            
  Effective Time / Total Time                     Effective Runs / Total Runs   
                 

org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations                
  3350202022 / 3357847817                         7 / 39                        
                 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions                
  11946476 / 588567543                            6 / 39                        
                 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences               
  516175887 / 577794974                           15 / 39                       
                 
org.apache.spark.sql.catalyst.analysis.TimeWindowing                            
  0 / 519817133                                   0 / 39                        
                 
org.apache.spark.sql.catalyst.analysis.DecimalPrecision                         
  226306881 / 271650752                           11 / 39                       
                 
org.apache.spark.sql.catalyst.optimizer.EliminateOuterJoin                      
  9838775 / 202214973                             1 / 6                         
                 
org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings              
  141138907 / 188596520                           3 / 39                        
                 
org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion                
  107365436 / 185270852                           3 / 39                        
                 
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts           
  58358334 / 140943690                            3 / 39                        
                 
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator                
  0 / 119236169                                   0 / 39                        
                 
org.apache.spark.sql.catalyst.optimizer.ColumnPruning                           
  41291489 / 76464261                             2 / 8                         
                 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder              
  0 / 64775042                                    0 / 39                        
                 
org.apache.spark.sql.catalyst.analysis.AlignViewOutput                          
  0 / 61796761                                    0 / 39                        
                 
org.apache.spark.sql.catalyst.analysis.TypeCoercion$BooleanEquality             
  0 / 58143331                                    0 / 39     
{noformat}


> spark3.1.1 use too long to analyze table fields
> -----------------------------------------------
>
>                 Key: SPARK-35365
>                 URL: https://issues.apache.org/jira/browse/SPARK-35365
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.1.1
>            Reporter: yao
>            Priority: Major
>
> I have a big sql with a few width tables join and complex logic, when I run 
> that in spark 2.4 , it will take 20 minues in analyze phase, when I use spark 
> 3.1.1, it will use about 40 minutes,
> I need set spark.sql.analyzer.maxIterations=1000 in spark3.1.1.
> or spark.sql.optimizer.maxIterations=1000 in spark2.4.
> no other special setting for this .
> I check on the spark ui , I find that there is no job generated, all executor 
> have no active tasks, and when I set log level to debug, I find that the job 
> is in analyze phase, analyze the fields reference.
> this phase use too long time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-35365) spark3.1.1 use too long to analyze table fields

Reply via email to