hudi-bot opened a new issue, #17358:
URL: https://github.com/apache/hudi/issues/17358

   When adding strict data validation within 
testMetadataBootstrapMORPartitionedInlineCompactionOn, the validation reveals 
that the partition path field reading fails (returns null) for some update 
records. 
   
   ## JIRA info
   
   - Link: https://issues.apache.org/jira/browse/HUDI-8837
   - Type: Sub-task
   - Parent: https://issues.apache.org/jira/browse/HUDI-9108
   - Fix version(s):
     - 1.1.0
   
   
   ---
   
   
   ## Comments
   
   10/Jan/25 00:12;yihua;The test is added in 
https://github.com/apache/hudi/pull/12490. Right now the validation excludes 
partition column.  When adding that in the validation, the validation fails.
   
    
   {code:java}
   def assertDfEquals(df1: DataFrame, df2: DataFrame): Unit = {
       assertEquals(df1.count, df2.count)
       // TODO(HUDI-8723): fix reading partition path field on metadata 
bootstrap table
       assertEquals(0, 
df1.drop(partitionColName).except(df2.drop(partitionColName)).count)
       assertEquals(0, 
df2.drop(partitionColName).except(df1.drop(partitionColName)).count)
     } {code}
    
   
    ;;;
   
   ---
   
   10/Jan/25 00:26;daviszhang;so we can remove the .drop(partitionColName) in 
the validation func you mentioned, ran all tests in the test suite, all green. 
Assigned back to you;;;
   
   ---
   
   28/Jan/25 01:05;yihua;This is still an issue for reading the partition 
column value out from a bootstrapped file slice (merging skeleton and data 
files), using the file group reader only. Deferring this ticket to 1.0.2 
release.;;;


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to