Re: [PR] [HUDI-6472] fix spark sql does not ignore case [hudi]

via GitHub Wed, 06 Mar 2024 08:43:05 -0800


jonvex commented on code in PR #10826:
URL: https://github.com/apache/hudi/pull/10826#discussion_r1514821447



##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala:
##########
@@ -95,7 +95,9 @@ object InsertIntoHoodieTableCommand extends Logging with 
ProvidesHoodieConfig wi
     }
     val config = buildHoodieInsertConfig(catalogTable, sparkSession, 
isOverWritePartition, isOverWriteTable, partitionSpec, extraOptions, 
staticOverwritePartitionPathOpt)
 
-    val alignedQuery = alignQueryOutput(query, catalogTable, partitionSpec, 
sparkSession.sessionState.conf)
+    val optimizer = sparkSession.sessionState.optimizer
+    val optimizerPlan = optimizer.execute(query)
+    val alignedQuery = alignQueryOutput(optimizerPlan, catalogTable, 
partitionSpec, sparkSession.sessionState.conf)

Review Comment:
   This is required. I don't know how @KnightChess figured this out; I'm 
impressed. You can see more detail in 
https://github.com/apache/hudi/pull/10582, but basically the optimizer checks 
to make sure the output of the plan doesn't change after each optimization 
step. Because of the name changes, the FoldablePropagation step will fail due 
to pulling up some expressions into the projection where we do renaming. We now 
run the optimizer on the query before doing the renaming. The expressions have 
already been pulled up, so when that optimization step runs again later, it 
won't do anything.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-6472] fix spark sql does not ignore case [hudi]

Reply via email to