hudi-bot opened a new issue, #17358: URL: https://github.com/apache/hudi/issues/17358
When adding strict data validation within testMetadataBootstrapMORPartitionedInlineCompactionOn, the validation reveals that the partition path field reading fails (returns null) for some update records. ## JIRA info - Link: https://issues.apache.org/jira/browse/HUDI-8837 - Type: Sub-task - Parent: https://issues.apache.org/jira/browse/HUDI-9108 - Fix version(s): - 1.1.0 --- ## Comments 10/Jan/25 00:12;yihua;The test is added in https://github.com/apache/hudi/pull/12490. Right now the validation excludes partition column. When adding that in the validation, the validation fails. {code:java} def assertDfEquals(df1: DataFrame, df2: DataFrame): Unit = { assertEquals(df1.count, df2.count) // TODO(HUDI-8723): fix reading partition path field on metadata bootstrap table assertEquals(0, df1.drop(partitionColName).except(df2.drop(partitionColName)).count) assertEquals(0, df2.drop(partitionColName).except(df1.drop(partitionColName)).count) } {code} ;;; --- 10/Jan/25 00:26;daviszhang;so we can remove the .drop(partitionColName) in the validation func you mentioned, ran all tests in the test suite, all green. Assigned back to you;;; --- 28/Jan/25 01:05;yihua;This is still an issue for reading the partition column value out from a bootstrapped file slice (merging skeleton and data files), using the file group reader only. Deferring this ticket to 1.0.2 release.;;; -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
