YannByron commented on pull request #4565: URL: https://github.com/apache/hudi/pull/4565#issuecomment-1010570382
> @YannByron can you please add PR description, so that it's a little more clear what we're trying to tackle here? Sure. When upgrade to spark3.2, the `TestMORDataSource.testCount` UT will fail to to assert `hudiIncDF4SkipMerge.count()` is equal with 200. The result is 100. That's because can not read data in parquet file if requiredStructSchema is empty. I think its a problem about parquet 1.12.1. But I can pass the full schema to the reader of parquet, and extract the required schema outside to solve this. `TestMORDataSource.testPrunedFiltered` also fails due to use `DefaultHoodieRecordPayload`. Both `PAYLOAD_ORDERING_FIELD_PROP_KEY ` and `PAYLOAD_EVENT_TIME_FIELD_PROP_KEY ` use `ts` as the default value and are never changed along with `preCombineField`, that cause a failure or not to update values with a latest precombine value. So I set the current `preCombineField` to `PAYLOAD_ORDERING_FIELD_PROP_KEY ` and `PAYLOAD_EVENT_TIME_FIELD_PROP_KEY `. Why this case can success in Spark with version < 3.2, it's so wried. But these changes work in all Spark versions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
