yihua opened a new pull request, #9436:
URL: https://github.com/apache/hudi/pull/9436

   ### Change Logs
   
   This PR fixes the partition validation to only consider commits in the 
metadata table validator (`HoodieMetadataTableValidator`) to avoid false 
positives.
   
   The partition validation considers all instants including rollbacks before 
this fix.  The completed rollback in the data table's timeline interferes with 
the partition validation in the metadata table validator. Only commits should 
be considered in the validation. See the following example.
   Timeline of DT and MDT:
   ```
   
╔═════╤═══════════════════╤═════════════╤═══════════╤═══════════════════╤═════════════╤═════════════╤═════════════╤═════════════╤═══════════╤═════════════╤═════════════╤═════════════╗
   ║ No. │ Instant           │ Action      │ State     │ Rollback Info     │ 
Requested   │ Inflight    │ Completed   │ MT          │ MT        │ MT          
│ MT          │ MT          ║
   ║     │                   │             │           │                   │ 
Time        │ Time        │ Time        │ Action      │ State     │ Requested   
│ Inflight    │ Completed   ║
   ║     │                   │             │           │                   │    
         │             │             │             │           │ Time        │ 
Time        │ Time        ║
   
╠═════╪═══════════════════╪═════════════╪═══════════╪═══════════════════╪═════════════╪═════════════╪═════════════╪═════════════╪═══════════╪═════════════╪═════════════╪═════════════╣
   ║ 0   │ 00000000000000010 │ -           │ -         │ -                 │ -  
         │ -           │ -           │ deltacommit │ COMPLETED │ 08-11 23:59 │ 
08-11 23:59 │ 08-11 23:59 ║
   
╟─────┼───────────────────┼─────────────┼───────────┼───────────────────┼─────────────┼─────────────┼─────────────┼─────────────┼───────────┼─────────────┼─────────────┼─────────────╢
   ║ 1   │ 00000000000000011 │ -           │ -         │ -                 │ -  
         │ -           │ -           │ deltacommit │ COMPLETED │ 08-11 23:59 │ 
08-11 23:59 │ 08-11 23:59 ║
   
╟─────┼───────────────────┼─────────────┼───────────┼───────────────────┼─────────────┼─────────────┼─────────────┼─────────────┼───────────┼─────────────┼─────────────┼─────────────╢
   ║ 2   │ 20230812065907463 │ deltacommit │ INFLIGHT  │ Rolled back by    │ 
08-11 23:59 │ 08-12 00:00 │ -           │ -           │ -         │ -           
│ -           │ -           ║
   ║     │                   │             │           │ 20230812070238150 │    
         │             │             │             │           │             │  
           │             ║
   
╟─────┼───────────────────┼─────────────┼───────────┼───────────────────┼─────────────┼─────────────┼─────────────┼─────────────┼───────────┼─────────────┼─────────────┼─────────────╢
   ║ 3   │ 20230812070238150 │ rollback    │ INFLIGHT  │ Rolls back        │ 
08-12 00:02 │ 08-12 00:02 │ -           │ -           │ -         │ -           
│ -           │ -           ║
   ║     │                   │             │           │ 20230812065907463 │    
         │             │             │             │           │             │  
           │             ║
   
╟─────┼───────────────────┼─────────────┼───────────┼───────────────────┼─────────────┼─────────────┼─────────────┼─────────────┼───────────┼─────────────┼─────────────┼─────────────╢
   ║ 4   │ 20230812070241429 │ -           │ -         │ -                 │ -  
         │ -           │ -           │ rollback    │ COMPLETED │ 08-12 00:02 │ 
08-12 00:02 │ 08-12 00:02 ║
   
╟─────┼───────────────────┼─────────────┼───────────┼───────────────────┼─────────────┼─────────────┼─────────────┼─────────────┼───────────┼─────────────┼─────────────┼─────────────╢
   ║ 5   │ 20230812070351902 │ deltacommit │ REQUESTED │ -                 │ 
08-12 00:04 │ -           │ -           │ -           │ -         │ -           
│ -           │ -           ║
   
╟─────┼───────────────────┼─────────────┼───────────┼───────────────────┼─────────────┼─────────────┼─────────────┼─────────────┼───────────┼─────────────┼─────────────┼─────────────╢
   ║ 6   │ 20230812070532879 │ deltacommit │ REQUESTED │ -                 │ 
08-12 00:06 │ -           │ -           │ -           │ -         │ -           
│ -           │ -           ║
   
╟─────┼───────────────────┼─────────────┼───────────┼───────────────────┼─────────────┼─────────────┼─────────────┼─────────────┼───────────┼─────────────┼─────────────┼─────────────╢
   ║ 7   │ 20230812070605364 │ rollback    │ COMPLETED │ Rolls back        │ 
08-12 00:06 │ 08-12 00:06 │ 08-12 00:06 │ deltacommit │ COMPLETED │ 08-12 00:06 
│ 08-12 00:06 │ 08-12 00:06 ║
   ║     │                   │             │           │ 20230812070205857 │    
         │             │             │             │           │             │  
           │             ║
   
╟─────┼───────────────────┼─────────────┼───────────┼───────────────────┼─────────────┼─────────────┼─────────────┼─────────────┼───────────┼─────────────┼─────────────┼─────────────╢
   ║ 8   │ 20230812070606670 │ -           │ -         │ -                 │ -  
         │ -           │ -           │ rollback    │ COMPLETED │ 08-12 00:06 │ 
08-12 00:06 │ 08-12 00:06 ║
   
╚═════╧═══════════════════╧═════════════╧═══════════╧═══════════════════╧═════════════╧═════════════╧═════════════╧═════════════╧═══════════╧═════════════╧═════════════╧═════════════╝
   ```
   The partition metadata indicates that the partition is created by an 
inflight commit to be rolled back:
   ```
   2023/06/24/.hoodie_partition_metadata
   #partition metadata
   #Sat Aug 12 07:00:21 UTC 2023
   commitTime=20230812065907463
   partitionDepth=3
   ```
   Since there is no completed commit, the partition should not be validated.  
Yet, the validator throws the exception:
   ```
   org.apache.hudi.exception.HoodieValidationException: Compare Partitions 
Failed! AllPartitionPathsFromFS : [2023/06/24, 2023/06/25, 2023/06/26, 
2023/06/27, 2023/06/28, 2023/06/29, 2023/06/30, 2023/07/01, 2023/07/02, 
2023/07/03] and allPartitionPathsMeta : []
        at 
org.apache.hudi.utilities.HoodieMetadataTableValidator.validatePartitions(HoodieMetadataTableValidator.java:558)
        at 
org.apache.hudi.utilities.HoodieMetadataTableValidator.doMetadataTableValidation(HoodieMetadataTableValidator.java:435)
        at 
org.apache.hudi.utilities.HoodieMetadataTableValidator.doHoodieMetadataTableValidationOnce(HoodieMetadataTableValidator.java:377)
        at 
org.apache.hudi.utilities.HoodieMetadataTableValidator.run(HoodieMetadataTableValidator.java:362)
        at 
org.apache.hudi.utilities.HoodieMetadataTableValidator.main(HoodieMetadataTableValidator.java:342)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   ```
   
   After the fix, the validation succeeds saying `The result of getting all 
partitions is null or empty, skip current validation`, which is correct.
   
   ### Impact
   
   Bug fix on metadata table validator (`HoodieMetadataTableValidator`).
   
   ### Risk level
   
   none
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to