KnightChess opened a new pull request, #10582:
URL: https://github.com/apache/hudi/pull/10582
### Change Logs
first:
SimpleAnalyzer will case sensitive, use sessionState analyzer replace it.
second:
after support `first` feature, will get exception.
```shell
Unexpected exception thrown: java.lang.RuntimeException: After applying rule
org.apache.spark.sql.catalyst.optimizer.FoldablePropagation in batch Operator
Optimization before Inferring Filters, the structural integrity of the plan is
broken.
org.opentest4j.AssertionFailedError: Unexpected exception thrown:
java.lang.RuntimeException: After applying rule
org.apache.spark.sql.catalyst.optimizer.FoldablePropagation in batch Operator
Optimization before Inferring Filters, the structural integrity of the plan is
broken.
at
org.junit.jupiter.api.AssertDoesNotThrow.createAssertionFailedError(AssertDoesNotThrow.java:83)
at
org.junit.jupiter.api.AssertDoesNotThrow.assertDoesNotThrow(AssertDoesNotThrow.java:54)
at
org.junit.jupiter.api.AssertDoesNotThrow.assertDoesNotThrow(AssertDoesNotThrow.java:37)
at
org.junit.jupiter.api.Assertions.assertDoesNotThrow(Assertions.java:3060)
at
org.apache.spark.sql.hudi.TestInsertTable.$anonfun$new$226(TestInsertTable.scala:2469)
at
org.apache.spark.sql.hudi.TestInsertTable.$anonfun$new$226$adapted(TestInsertTable.scala:2438)
at scala.collection.immutable.List.foreach(List.scala:392)
at
org.apache.spark.sql.hudi.TestInsertTable.$anonfun$new$225(TestInsertTable.scala:2438)
```
because we will use primariKey to replace datafram schema, because avro is
case-sensitive.
And optimizer rule `FoldablePropagation` will replace attributes with
aliases of the original foldable expressions if possible.
we need optimizer it before we use primaryKey name replace it, because avro
is case-sensitive.
```shell
if primaryKey is id, and logical plan has resolved, when we use id replace
ID#22 name, the logical plan will like:
Project [id#22, name#24, price#25, ts#26L]
+- Sort [ID#22 ASC NULLS FIRST], true
+- Project [1 AS ID#22, name#24, price#25, ts#26L]
+- SubqueryAlias spark_catalog.default.h1
+- Relation default.h1[ID#23,name#24,price#25,ts#26L] parquet
this logical plan will be optimizer in FoldablePropagation:
Project [1 AS ID#22, name#24, price#25, ts#26L]
+- Sort [1 ASC NULLS FIRST], true
+- Project [1 AS ID#22, name#24, price#25, ts#26L]
+- Relation default.h1[ID#23,name#24,price#25,ts#26L] parquet
in optimizer, `RuleExecutor` will use `isPlanIntegral` to check prePlan and
curPlan schema are the same, ut will failed
```
### Impact
`insert into` and `merge sql` will increase the optimizer analysis.
### Risk level (write none, low medium or high below)
low
### Documentation Update
none
### Contributor's checklist
- [ ] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [ ] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]