singhpk234 commented on a change in pull request #3992: URL: https://github.com/apache/iceberg/pull/3992#discussion_r796278084
########## File path: spark/v3.2/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/RewriteMergeIntoTable.scala ########## @@ -186,7 +187,7 @@ object RewriteMergeIntoTable extends RewriteRowLevelCommand { // disable broadcasts for the target table to perform the cardinality check val joinType = if (notMatchedActions.isEmpty) LeftOuter else FullOuter val joinHint = JoinHint(leftHint = Some(HintInfo(Some(NO_BROADCAST_HASH))), rightHint = None) - val joinPlan = Join(targetTableProj, sourceTableProj, joinType, Some(cond), joinHint) + val joinPlan = Join(NoStatsUnaryNode(targetTableProj), sourceTableProj, joinType, Some(cond), joinHint) Review comment: [doubt] Hi All, Can adding a new UnaryNode mess with predicatePushDown rule ? At present predicatePushDown rule maintains the list of unary logical operator from which it's safe to push down the filter's from. so when a new node we introduce since it's not present in the allow-list and hence filter will get stuck at the parent of that node ... and will not be pushed through, which in turn cause perf regression. Not sure if this is the case what's the best possible way to handle it. [Spark 3.2 code pointer](https://github.com/apache/spark/blob/v3.2.0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L1528-L1550) ``` case filter @ Filter(_, u: UnaryNode) if canPushThrough(u) && u.expressions.forall(_.deterministic) => pushDownPredicate(filter, u.child) { predicate => u.withNewChildren(Seq(Filter(predicate, u.child))) } } def canPushThrough(p: UnaryNode): Boolean = p match { // Note that some operators (e.g. project, aggregate, union) are being handled separately // (earlier in this rule). case _: AppendColumns => true case _: Distinct => true case _: Generate => true case _: Pivot => true case _: RepartitionByExpression => true case _: Repartition => true case _: ScriptTransformation => true case _: Sort => true case _: BatchEvalPython => true case _: ArrowEvalPython => true case _: Expand => true case _ => false } ``` I recently hit something similar (not in iceberg), hence posting here just in case it's applicable here as well. cc @aokolnychyi @rdblue @RussellSpitzer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org