[
https://issues.apache.org/jira/browse/SPARK-23171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16609279#comment-16609279
]
Wenchen Fan commented on SPARK-23171:
-------------------------------------
I'm removing the target version, since no progress so far.
> Reduce the time costs of the rule runs that do not change the plans
> --------------------------------------------------------------------
>
> Key: SPARK-23171
> URL: https://issues.apache.org/jira/browse/SPARK-23171
> Project: Spark
> Issue Type: Umbrella
> Components: SQL
> Affects Versions: 2.3.0
> Reporter: Xiao Li
> Priority: Major
>
> Below is the time stats of Analyzer/Optimizer rules. Try to improve the rules
> and reduce the time costs, especially for the runs that do not change the
> plans.
> {noformat}
> === Metrics of Analyzer/Optimizer Rules ===
> Total number of runs = 175827
> Total time: 20.699042877 seconds
> Rule
> Total Time Effective Time Total Runs
> Effective Runs
> org.apache.spark.sql.catalyst.optimizer.ColumnPruning
> 2340563794 1338268224 1875
> 761
> org.apache.spark.sql.catalyst.analysis.Analyzer$CTESubstitution
> 1632672623 1625071881 788
> 37
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions
> 1395087131 347339931 1982
> 38
> org.apache.spark.sql.catalyst.optimizer.PruneFilters
> 1177711364 21344174 1590
> 3
> org.apache.spark.sql.catalyst.optimizer.Optimizer$OptimizeSubqueries
> 1145135465 1131417128 285
> 39
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences
> 1008347217 663112062 1982
> 616
> org.apache.spark.sql.catalyst.optimizer.ReorderJoin
> 767024424 693001699 1590
> 132
> org.apache.spark.sql.catalyst.analysis.Analyzer$FixNullability
> 598524650 40802876 742
> 12
> org.apache.spark.sql.catalyst.analysis.DecimalPrecision
> 595384169 436153128 1982
> 211
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery
> 548178270 459695885 1982
> 49
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts
> 423002864 139869503 1982
> 86
> org.apache.spark.sql.catalyst.optimizer.BooleanSimplification
> 405544962 17250184 1590
> 7
> org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin
> 383837603 284174662 1590
> 708
> org.apache.spark.sql.catalyst.optimizer.RemoveRedundantAliases
> 372901885 3362332 1590
> 9
> org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints
> 364628214 343815519 285
> 192
> org.apache.spark.sql.execution.datasources.FindDataSourceTable
> 303293296 285344766 1982
> 233
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions
> 233195019 92648171 1982
> 294
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion
> 220568919 73932736 1982
> 38
> org.apache.spark.sql.catalyst.optimizer.NullPropagation
> 207976072 9072305 1590
> 26
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings
> 207027618 37834145 1982
> 40
> org.apache.spark.sql.catalyst.optimizer.PushDownPredicate
> 203382836 176482044 1590
> 783
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion
> 192152216 15738573 1982
> 1
> org.apache.spark.sql.catalyst.optimizer.ConstantFolding
> 191624610 58857553 1590
> 126
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$CaseWhenCoercion
> 183008262 78280172 1982
> 29
> org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator
> 176935299 0 1982
> 0
> org.apache.spark.sql.catalyst.analysis.ResolveTimeZone
> 170161002 74354990 1982
> 417
> org.apache.spark.sql.catalyst.optimizer.ReorderAssociativeOperator
> 166173174 0 1590
> 0
> org.apache.spark.sql.catalyst.optimizer.OptimizeIn
> 155410763 8197045 1590
> 16
> org.apache.spark.sql.catalyst.optimizer.SimplifyCaseConversionExpressions
> 153726565 0 1590
> 0
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$IfCoercion
> 153013269 0 1982
> 0
> org.apache.spark.sql.catalyst.optimizer.SimplifyCasts
> 146693495 13537077 1590
> 69
> org.apache.spark.sql.catalyst.optimizer.SimplifyBinaryComparison
> 144818581 0 1590
> 0
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$DateTimeOperations
> 143943308 6889302 1982
> 27
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$Division
> 142925142 12653147 1982
> 8
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$BooleanEquality
> 142775965 0 1982
> 0
> org.apache.spark.sql.catalyst.optimizer.SimplifyConditionals
> 141509150 0 1590
> 0
> org.apache.spark.sql.catalyst.optimizer.LikeSimplification
> 132387762 636851 1590
> 1
> org.apache.spark.sql.catalyst.optimizer.RemoveDispensableExpressions
> 127412361 0 1590
> 0
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowFrame
> 126772671 9317887 1982
> 21
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$ConcatCoercion
> 116484407 0 1982
> 0
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$EltCoercion
> 115402736 0 1982
> 0
> org.apache.spark.sql.catalyst.analysis.ResolveCreateNamedStruct
> 115071447 0 1982
> 0
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder
> 113115366 4563584 1982
> 14
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$WindowFrameCoercion
> 107747140 0 1982
> 0
> org.apache.spark.sql.catalyst.optimizer.EliminateOuterJoin
> 105020607 13907906 1590
> 11
> org.apache.spark.sql.catalyst.analysis.TimeWindowing
> 101018029 0 1982
> 0
> org.apache.spark.sql.catalyst.optimizer.RewriteCorrelatedScalarSubquery
> 98043747 7044358 1590
> 7
> org.apache.spark.sql.catalyst.optimizer.ConstantPropagation
> 95173536 0 1590
> 0
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$StackCoercion
> 94134701 0 1982
> 0
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics
> 84419135 33892351 1982
> 11
> org.apache.spark.sql.execution.datasources.DataSourceAnalysis
> 83297816 77023484 742
> 24
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer
> 77880196 36980636 1982
> 148
> org.apache.spark.sql.execution.datasources.PreprocessTableCreation
> 74091407 0 742
> 0
> org.apache.spark.sql.catalyst.analysis.CleanupAliases
> 73837147 37105855 1086
> 344
> org.apache.spark.sql.catalyst.optimizer.RemoveRedundantProject
> 73534618 31752937 1875
> 344
> org.apache.spark.sql.execution.datasources.v2.PushDownOperatorsToDataSource
> 70120541 0 285
> 0
> org.apache.spark.sql.catalyst.optimizer.FoldablePropagation
> 67941776 0 1590
> 0
> org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions
> 62917712 22092402 1982
> 23
> org.apache.spark.sql.catalyst.optimizer.CombineFilters
> 61116313 41021442 1590
> 449
> org.apache.spark.sql.catalyst.optimizer.CollapseProject
> 60872313 30994661 1875
> 279
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAliases
> 58453489 12511798 1982
> 47
> org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions
> 58154315 0 750
> 0
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions
> 54678669 0 285
> 0
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveMissingReferences
> 53518211 7209138 1982
> 8
> org.apache.spark.sql.catalyst.optimizer.PullupCorrelatedPredicates
> 45840637 29436271 285
> 23
> org.apache.spark.sql.catalyst.optimizer.CollapseRepartition
> 43321502 0 1590
> 0
> org.apache.spark.sql.catalyst.analysis.Analyzer$PullOutNondeterministic
> 42117785 0 742
> 0
> org.apache.spark.sql.catalyst.optimizer.ComputeCurrentTime
> 40843184 0 285
> 0
> org.apache.spark.sql.catalyst.optimizer.PushProjectionThroughUnion
> 39997563 5899863 1590
> 10
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations
> 39412748 22359409 1990
> 233
> org.apache.spark.sql.catalyst.optimizer.CombineUnions
> 38823264 1534424 1875
> 17
> org.apache.spark.sql.catalyst.analysis.TypeCoercion$WidenSetOperationTypes
> 38712372 7912192 1982
> 9
> org.apache.spark.sql.catalyst.analysis.Analyzer$HandleNullInputsForUDF
> 38281659 0 742
> 0
> org.apache.spark.sql.catalyst.optimizer.DecimalAggregates
> 38277381 17245272 385
> 100
> org.apache.spark.sql.execution.datasources.ResolveSQLOnFile
> 37342019 0 1982
> 0
> org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates
> 36958378 1207331 1982
> 46
> org.apache.spark.sql.catalyst.optimizer.CombineLimits
> 36794793 0 1590
> 0
> org.apache.spark.sql.catalyst.optimizer.LimitPushDown
> 36378469 0 1590
> 0
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNewInstance
> 34611065 0 1982
> 0
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast
> 33734785 0 1982
> 0
> org.apache.spark.sql.catalyst.optimizer.EliminateSorts
> 33731370 0 1590
> 0
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOrdinalInOrderByAndGroupBy
> 33251765 1395920 1982
> 4
> org.apache.spark.sql.catalyst.optimizer.EliminateSerialization
> 30890996 0 1590
> 0
> org.apache.spark.sql.catalyst.optimizer.CollapseWindow
> 29512740 0 1590
> 0
> org.apache.spark.sql.catalyst.optimizer.ReplaceIntersectWithSemiJoin
> 29396498 1492235 300
> 7
> org.apache.spark.sql.catalyst.optimizer.RewritePredicateSubquery
> 29301037 21706110 285
> 148
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggAliasInGroupBy
> 23819074 0 1982
> 0
> org.apache.spark.sql.catalyst.analysis.SubstituteUnresolvedOrdinals
> 23136089 10062248 788
> 4
> org.apache.spark.sql.execution.datasources.PreprocessTableInsertion
> 20886216 0 742
> 0
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolvePivot
> 20639329 0 1982
> 0
> org.apache.spark.sql.catalyst.analysis.ResolveTableValuedFunctions
> 20293829 0 1990
> 0
> org.apache.spark.sql.catalyst.analysis.ResolveInlineTables
> 20255898 0 1982
> 0
> org.apache.spark.sql.catalyst.analysis.ResolveHints$ResolveBroadcastHints
> 20250460 0 750
> 0
> org.apache.spark.sql.catalyst.expressions.codegen.package$ExpressionCanonicalizer$CleanExpressions
> 19990727 39271 8280 26
>
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate
> 19578333 0 1982
> 0
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubqueryColumnAliases
> 19414993 0 1982
> 0
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts
> 19291402 0 285
> 0
> org.apache.spark.sql.catalyst.optimizer.ReplaceExpressions
> 18790135 0 285
> 0
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin
> 18535762 0 1982
> 0
> org.apache.spark.sql.catalyst.optimizer.EliminateMapObjects
> 17835919 0 285
> 0
> org.apache.spark.sql.catalyst.optimizer.PropagateEmptyRelation
> 15200130 1525030 288
> 3
> org.apache.spark.sql.catalyst.optimizer.GetCurrentDatabase
> 14490778 0 285
> 0
> org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
> 14021504 12790020 285
> 215
> org.apache.spark.sql.catalyst.optimizer.RewriteDistinctAggregates
> 13439887 0 285
> 0
> org.apache.spark.sql.catalyst.analysis.EliminateBarriers
> 12336513 0 1086
> 0
> org.apache.spark.sql.execution.OptimizeMetadataOnlyQuery
> 12082986 0 285
> 0
> org.apache.spark.sql.catalyst.analysis.UpdateOuterReferences
> 10792280 0 742
> 0
> org.apache.spark.sql.execution.python.ExtractPythonUDFFromAggregate
> 8978897 0 285
> 0
> org.apache.spark.sql.catalyst.analysis.EliminateUnions
> 8886439 0 788
> 0
> org.apache.spark.sql.catalyst.analysis.AliasViewChild
> 8317231 0 742
> 0
> org.apache.spark.sql.catalyst.optimizer.RemoveRepetitionFromGroupExpressions
> 7964788 184237 286
> 1
> org.apache.spark.sql.catalyst.analysis.Analyzer$WindowsSubstitution
> 7396593 0 788
> 0
> org.apache.spark.sql.catalyst.analysis.ResolveHints$RemoveAllHints
> 6986385 0 750
> 0
> org.apache.spark.sql.catalyst.analysis.EliminateView
> 6518436 0 285
> 0
> org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation
> 6452598 0 288
> 0
> org.apache.spark.sql.catalyst.optimizer.RemoveLiteralFromGroupExpressions
> 5510866 0 286
> 0
> org.apache.spark.sql.catalyst.optimizer.ReplaceExceptWithFilter
> 5393429 0 300
> 0
> org.apache.spark.sql.catalyst.optimizer.SimplifyCreateArrayOps
> 5296187 0 1590
> 0
> org.apache.spark.sql.catalyst.optimizer.SimplifyCreateStructOps
> 5261249 0 1590
> 0
> org.apache.spark.sql.catalyst.optimizer.ReplaceExceptWithAntiJoin
> 5152594 925260 300
> 1
> org.apache.spark.sql.catalyst.optimizer.CombineConcats
> 4916416 0 1590
> 0
> org.apache.spark.sql.catalyst.optimizer.CombineTypedFilters
> 4810314 0 285
> 0
> org.apache.spark.sql.catalyst.optimizer.SimplifyCreateMapOps
> 4674195 0 1590
> 0
> org.apache.spark.sql.catalyst.optimizer.ReplaceDistinctWithAggregate
> 4406136 727433 300
> 15
> org.apache.spark.sql.catalyst.optimizer.ReplaceDeduplicateWithAggregate
> 4252456 0 285
> 0
> org.apache.spark.sql.catalyst.optimizer.EliminateDistinct
> 1920392 0 285
> 0
> org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder
> 1855658 0 285
> 0
>
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]