[PR] [HUDI-6472] fix spark sql does not ignore case [hudi]

via GitHub Mon, 29 Jan 2024 06:01:41 -0800


KnightChess opened a new pull request, #10582:
URL: https://github.com/apache/hudi/pull/10582


   ### Change Logs
   
   first:
   SimpleAnalyzer will case sensitive, use sessionState analyzer replace it.
   
   second:
   after support `first` feature, will get exception.
   ```shell
   Unexpected exception thrown: java.lang.RuntimeException: After applying rule 
org.apache.spark.sql.catalyst.optimizer.FoldablePropagation in batch Operator 
Optimization before Inferring Filters, the structural integrity of the plan is 
broken.
   org.opentest4j.AssertionFailedError: Unexpected exception thrown: 
java.lang.RuntimeException: After applying rule 
org.apache.spark.sql.catalyst.optimizer.FoldablePropagation in batch Operator 
Optimization before Inferring Filters, the structural integrity of the plan is 
broken.
        at 
org.junit.jupiter.api.AssertDoesNotThrow.createAssertionFailedError(AssertDoesNotThrow.java:83)
        at 
org.junit.jupiter.api.AssertDoesNotThrow.assertDoesNotThrow(AssertDoesNotThrow.java:54)
        at 
org.junit.jupiter.api.AssertDoesNotThrow.assertDoesNotThrow(AssertDoesNotThrow.java:37)
        at 
org.junit.jupiter.api.Assertions.assertDoesNotThrow(Assertions.java:3060)
        at 
org.apache.spark.sql.hudi.TestInsertTable.$anonfun$new$226(TestInsertTable.scala:2469)
        at 
org.apache.spark.sql.hudi.TestInsertTable.$anonfun$new$226$adapted(TestInsertTable.scala:2438)
        at scala.collection.immutable.List.foreach(List.scala:392)
        at 
org.apache.spark.sql.hudi.TestInsertTable.$anonfun$new$225(TestInsertTable.scala:2438)
   
   ```
   because we will use primariKey to replace datafram schema, because avro is 
case-sensitive.
   And optimizer rule `FoldablePropagation` will replace attributes with 
aliases of the original foldable expressions if possible.
   we need optimizer it before we use primaryKey name replace it, because avro 
is case-sensitive.
   ```shell
   if primaryKey is id, and logical plan has resolved, when we use id replace 
ID#22 name, the logical plan will like:
      Project [id#22, name#24, price#25, ts#26L]
        +- Sort [ID#22 ASC NULLS FIRST], true
           +- Project [1 AS ID#22, name#24, price#25, ts#26L]
              +- SubqueryAlias spark_catalog.default.h1
                 +- Relation default.h1[ID#23,name#24,price#25,ts#26L] parquet
   this logical plan will be optimizer in FoldablePropagation:
      Project [1 AS ID#22, name#24, price#25, ts#26L]
        +- Sort [1 ASC NULLS FIRST], true
           +- Project [1 AS ID#22, name#24, price#25, ts#26L]
              +- Relation default.h1[ID#23,name#24,price#25,ts#26L] parquet
   in optimizer, `RuleExecutor` will use `isPlanIntegral` to check prePlan and 
curPlan schema are the same, ut will failed
   ```
   
   ### Impact
   
   `insert into` and `merge sql` will increase the optimizer analysis.
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [HUDI-6472] fix spark sql does not ignore case [hudi]

Reply via email to