geserdugarov opened a new pull request, #11501:
URL: https://github.com/apache/hudi/pull/11501

   ### Change Logs
   
   Before this MR for `TimestampBasedKeyGenerator` we got
   ```
   Failed to cast value `2004-02-29 01` to `LongType` for partition column `ts`
   ```
   When we read data by Spark, `listPartitionPaths()` is called with 
`parsePartitionColumnValues()`, and we got ClassCastException during parsing. 
But we couldn't reconstruct partition column values from partition paths when 
`TimestampBasedKeyGenerator` is used, due to lost information after 
corresponding processing of values.
   
   This MR fixes ClassCastException, but there are still not finished separate 
tasks (old ones) mentioned in the added `TestSparkSqlWithTimestampKeyGenerator`:
   
   - Fix for [HUDI-3896] overwrites 
`shouldExtractPartitionValuesFromPartitionPath` in `BaseFileOnlyRelation`. I 
couldn't figure out during fixing this issue, should it be fixed, or it should 
be left as it is.
   
   - There is no logic for `shouldExtractPartitionValuesFromPartitionPath` in 
`HoodieBaseHadoopFsRelationFactory`. Couldn't find a corresponding task, so 
created a new one, HUDI-7925.
   
   ### Impact
   
   Fixes ClassCastException.
   
   ### Risk level (write none, low medium or high below)
   
   Low. Affects only if `TimestampBasedKeyGenerator` is used. There is added 
`TestSparkSqlWithTimestampKeyGenerator`.
   
   ### Documentation Update
   
   No need.
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Change Logs and Impact were stated clearly
   - [x] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to