yihua opened a new pull request, #9436:
URL: https://github.com/apache/hudi/pull/9436
### Change Logs
This PR fixes the partition validation to only consider commits in the
metadata table validator (`HoodieMetadataTableValidator`) to avoid false
positives.
The partition validation considers all instants including rollbacks before
this fix. The completed rollback in the data table's timeline interferes with
the partition validation in the metadata table validator. Only commits should
be considered in the validation. See the following example.
Timeline of DT and MDT:
```
╔═════╤═══════════════════╤═════════════╤═══════════╤═══════════════════╤═════════════╤═════════════╤═════════════╤═════════════╤═══════════╤═════════════╤═════════════╤═════════════╗
║ No. │ Instant │ Action │ State │ Rollback Info │
Requested │ Inflight │ Completed │ MT │ MT │ MT
│ MT │ MT ║
║ │ │ │ │ │
Time │ Time │ Time │ Action │ State │ Requested
│ Inflight │ Completed ║
║ │ │ │ │ │
│ │ │ │ │ Time │
Time │ Time ║
╠═════╪═══════════════════╪═════════════╪═══════════╪═══════════════════╪═════════════╪═════════════╪═════════════╪═════════════╪═══════════╪═════════════╪═════════════╪═════════════╣
║ 0 │ 00000000000000010 │ - │ - │ - │ -
│ - │ - │ deltacommit │ COMPLETED │ 08-11 23:59 │
08-11 23:59 │ 08-11 23:59 ║
╟─────┼───────────────────┼─────────────┼───────────┼───────────────────┼─────────────┼─────────────┼─────────────┼─────────────┼───────────┼─────────────┼─────────────┼─────────────╢
║ 1 │ 00000000000000011 │ - │ - │ - │ -
│ - │ - │ deltacommit │ COMPLETED │ 08-11 23:59 │
08-11 23:59 │ 08-11 23:59 ║
╟─────┼───────────────────┼─────────────┼───────────┼───────────────────┼─────────────┼─────────────┼─────────────┼─────────────┼───────────┼─────────────┼─────────────┼─────────────╢
║ 2 │ 20230812065907463 │ deltacommit │ INFLIGHT │ Rolled back by │
08-11 23:59 │ 08-12 00:00 │ - │ - │ - │ -
│ - │ - ║
║ │ │ │ │ 20230812070238150 │
│ │ │ │ │ │
│ ║
╟─────┼───────────────────┼─────────────┼───────────┼───────────────────┼─────────────┼─────────────┼─────────────┼─────────────┼───────────┼─────────────┼─────────────┼─────────────╢
║ 3 │ 20230812070238150 │ rollback │ INFLIGHT │ Rolls back │
08-12 00:02 │ 08-12 00:02 │ - │ - │ - │ -
│ - │ - ║
║ │ │ │ │ 20230812065907463 │
│ │ │ │ │ │
│ ║
╟─────┼───────────────────┼─────────────┼───────────┼───────────────────┼─────────────┼─────────────┼─────────────┼─────────────┼───────────┼─────────────┼─────────────┼─────────────╢
║ 4 │ 20230812070241429 │ - │ - │ - │ -
│ - │ - │ rollback │ COMPLETED │ 08-12 00:02 │
08-12 00:02 │ 08-12 00:02 ║
╟─────┼───────────────────┼─────────────┼───────────┼───────────────────┼─────────────┼─────────────┼─────────────┼─────────────┼───────────┼─────────────┼─────────────┼─────────────╢
║ 5 │ 20230812070351902 │ deltacommit │ REQUESTED │ - │
08-12 00:04 │ - │ - │ - │ - │ -
│ - │ - ║
╟─────┼───────────────────┼─────────────┼───────────┼───────────────────┼─────────────┼─────────────┼─────────────┼─────────────┼───────────┼─────────────┼─────────────┼─────────────╢
║ 6 │ 20230812070532879 │ deltacommit │ REQUESTED │ - │
08-12 00:06 │ - │ - │ - │ - │ -
│ - │ - ║
╟─────┼───────────────────┼─────────────┼───────────┼───────────────────┼─────────────┼─────────────┼─────────────┼─────────────┼───────────┼─────────────┼─────────────┼─────────────╢
║ 7 │ 20230812070605364 │ rollback │ COMPLETED │ Rolls back │
08-12 00:06 │ 08-12 00:06 │ 08-12 00:06 │ deltacommit │ COMPLETED │ 08-12 00:06
│ 08-12 00:06 │ 08-12 00:06 ║
║ │ │ │ │ 20230812070205857 │
│ │ │ │ │ │
│ ║
╟─────┼───────────────────┼─────────────┼───────────┼───────────────────┼─────────────┼─────────────┼─────────────┼─────────────┼───────────┼─────────────┼─────────────┼─────────────╢
║ 8 │ 20230812070606670 │ - │ - │ - │ -
│ - │ - │ rollback │ COMPLETED │ 08-12 00:06 │
08-12 00:06 │ 08-12 00:06 ║
╚═════╧═══════════════════╧═════════════╧═══════════╧═══════════════════╧═════════════╧═════════════╧═════════════╧═════════════╧═══════════╧═════════════╧═════════════╧═════════════╝
```
The partition metadata indicates that the partition is created by an
inflight commit to be rolled back:
```
2023/06/24/.hoodie_partition_metadata
#partition metadata
#Sat Aug 12 07:00:21 UTC 2023
commitTime=20230812065907463
partitionDepth=3
```
Since there is no completed commit, the partition should not be validated.
Yet, the validator throws the exception:
```
org.apache.hudi.exception.HoodieValidationException: Compare Partitions
Failed! AllPartitionPathsFromFS : [2023/06/24, 2023/06/25, 2023/06/26,
2023/06/27, 2023/06/28, 2023/06/29, 2023/06/30, 2023/07/01, 2023/07/02,
2023/07/03] and allPartitionPathsMeta : []
at
org.apache.hudi.utilities.HoodieMetadataTableValidator.validatePartitions(HoodieMetadataTableValidator.java:558)
at
org.apache.hudi.utilities.HoodieMetadataTableValidator.doMetadataTableValidation(HoodieMetadataTableValidator.java:435)
at
org.apache.hudi.utilities.HoodieMetadataTableValidator.doHoodieMetadataTableValidationOnce(HoodieMetadataTableValidator.java:377)
at
org.apache.hudi.utilities.HoodieMetadataTableValidator.run(HoodieMetadataTableValidator.java:362)
at
org.apache.hudi.utilities.HoodieMetadataTableValidator.main(HoodieMetadataTableValidator.java:342)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
```
After the fix, the validation succeeds saying `The result of getting all
partitions is null or empty, skip current validation`, which is correct.
### Impact
Bug fix on metadata table validator (`HoodieMetadataTableValidator`).
### Risk level
none
### Documentation Update
N/A
### Contributor's checklist
- [ ] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [ ] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]