geserdugarov opened a new pull request, #11501: URL: https://github.com/apache/hudi/pull/11501
### Change Logs Before this MR for `TimestampBasedKeyGenerator` we got ``` Failed to cast value `2004-02-29 01` to `LongType` for partition column `ts` ``` When we read data by Spark, `listPartitionPaths()` is called with `parsePartitionColumnValues()`, and we got ClassCastException during parsing. But we couldn't reconstruct partition column values from partition paths when `TimestampBasedKeyGenerator` is used, due to lost information after corresponding processing of values. This MR fixes ClassCastException, but there are still not finished separate tasks (old ones) mentioned in the added `TestSparkSqlWithTimestampKeyGenerator`: - Fix for [HUDI-3896] overwrites `shouldExtractPartitionValuesFromPartitionPath` in `BaseFileOnlyRelation`. I couldn't figure out during fixing this issue, should it be fixed, or it should be left as it is. - There is no logic for `shouldExtractPartitionValuesFromPartitionPath` in `HoodieBaseHadoopFsRelationFactory`. Couldn't find a corresponding task, so created a new one, HUDI-7925. ### Impact Fixes ClassCastException. ### Risk level (write none, low medium or high below) Low. Affects only if `TimestampBasedKeyGenerator` is used. There is added `TestSparkSqlWithTimestampKeyGenerator`. ### Documentation Update No need. ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Change Logs and Impact were stated clearly - [x] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
