[GitHub] [hudi] YannByron commented on pull request #4565: [HUDI-3215] Solve UT for Spark 3.2

GitBox Tue, 11 Jan 2022 18:33:16 -0800


YannByron commented on pull request #4565:
URL: https://github.com/apache/hudi/pull/4565#issuecomment-1010570382



   > @YannByron can you please add PR description, so that it's a little more 
clear what we're trying to tackle here?
   
   Sure.
   
   When upgrade to spark3.2, the `TestMORDataSource.testCount` UT will fail to 
to assert `hudiIncDF4SkipMerge.count()` is equal with 200. The result is 100. 
That's because can not read data in parquet file if requiredStructSchema is 
empty. I think its a problem about parquet 1.12.1.
   But I can pass the full schema to the reader of parquet, and extract the 
required schema outside to solve this.
   
   `TestMORDataSource.testPrunedFiltered` also fails due to use 
`DefaultHoodieRecordPayload`. Both `PAYLOAD_ORDERING_FIELD_PROP_KEY ` and 
`PAYLOAD_EVENT_TIME_FIELD_PROP_KEY ` use `ts` as the default value and are 
never changed along with `preCombineField`, that cause a failure or not to 
update values with a latest precombine value.
   So I set the current `preCombineField` to `PAYLOAD_ORDERING_FIELD_PROP_KEY ` 
and `PAYLOAD_EVENT_TIME_FIELD_PROP_KEY `.
   Why this case can success in Spark with version < 3.2, it's so wried. But 
these changes work in all Spark versions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] YannByron commented on pull request #4565: [HUDI-3215] Solve UT for Spark 3.2

Reply via email to