jonvex commented on code in PR #10826:
URL: https://github.com/apache/hudi/pull/10826#discussion_r1514821447
##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala:
##########
@@ -95,7 +95,9 @@ object InsertIntoHoodieTableCommand extends Logging with
ProvidesHoodieConfig wi
}
val config = buildHoodieInsertConfig(catalogTable, sparkSession,
isOverWritePartition, isOverWriteTable, partitionSpec, extraOptions,
staticOverwritePartitionPathOpt)
- val alignedQuery = alignQueryOutput(query, catalogTable, partitionSpec,
sparkSession.sessionState.conf)
+ val optimizer = sparkSession.sessionState.optimizer
+ val optimizerPlan = optimizer.execute(query)
+ val alignedQuery = alignQueryOutput(optimizerPlan, catalogTable,
partitionSpec, sparkSession.sessionState.conf)
Review Comment:
This is required. I don't know how @KnightChess figured this out; I'm
impressed. You can see more detail in
https://github.com/apache/hudi/pull/10582, but basically the optimizer checks
to make sure the output of the plan doesn't change after each optimization
step. Because of the name changes, the FoldablePropagation step will fail due
to pulling up some expressions into the projection where we do renaming. We now
run the optimizer on the query before doing the renaming. The expressions have
already been pulled up, so when that optimization step runs again later, it
won't do anything.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]