codope commented on code in PR #18055:
URL: https://github.com/apache/hudi/pull/18055#discussion_r2749676544
##########
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java:
##########
@@ -559,11 +560,30 @@ public StorageConfiguration<?> getStorageConf() {
*/
public synchronized HoodieActiveTimeline getActiveTimeline() {
if (activeTimeline == null) {
- activeTimeline =
tableFormat.getTimelineFactory().createActiveTimeline(this);
+ if (rawActiveTimeline == null) {
+ rawActiveTimeline = getRawActiveTimeline();
+ }
+ List<HoodieInstant> activeInstants =
TimelineLayout.fromVersion(getTimelineLayoutVersion())
+ .filterHoodieInstants(rawActiveTimeline.getInstants().stream())
+ .sorted()
+ .collect(Collectors.toList());
+ activeTimeline =
tableFormat.getTimelineFactory().createActiveTimeline(this, activeInstants);
Review Comment:
This modifies `getActiveTimeline()` to derive from `rawActiveTimeline`. But
`reloadActiveTimeline()` creates `activeTimeline` directly via
`createActiveTimeline(this)`, which performs its own filesystem scan. This
means:
- `getActiveTimeline()` --> derives from `rawActiveTimeline`
- `reloadActiveTimeline()` --> scans filesystem independently
This is inconsistent and could lead to subtle bugs where the two timelines
diverge.
##########
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java:
##########
Review Comment:
Should we reset `rawActiveTimeline` here to avoid stale cache?
##########
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/TimelineFactory.java:
##########
@@ -42,4 +43,6 @@ public abstract class TimelineFactory implements Serializable
{
public abstract HoodieActiveTimeline
createActiveTimeline(HoodieTableMetaClient metaClient, boolean
applyLayoutFilter);
public abstract CompletionTimeQueryView
createCompletionTimeQueryView(HoodieTableMetaClient metaClient);
+
+ public abstract HoodieActiveTimeline
createActiveTimeline(HoodieTableMetaClient metaClient, List<HoodieInstant>
instants);
Review Comment:
Please add javadoc explaining:
- Purpose of this method
- That it creates a timeline from pre-loaded instants (no filesystem scan)
- When to use this vs `createActiveTimeline(metaClient)` or
`createActiveTimeline(metaClient, applyLayoutFilter)`
##########
hudi-hadoop-common/src/test/java/org/apache/hudi/common/table/TestHoodieTableMetaClient.java:
##########
@@ -378,4 +379,31 @@ void testReadIndexDefFromStorage() throws Exception {
}
}, "Should throw HoodieIOException for invalid JSON");
}
+
+ @Test
+ void testGetRawActiveTimeline() {
Review Comment:
thanks for adding this test. I think we should cover more scenarios:
- Verify that `rawActiveTimeline` contains all instants (including
intermediate states)
- Verify that `activeTimeline` contains filtered instants (only latest state
per action)
- Verify that filtering is correctly applied
- Verify that `reloadActiveTimeline()` properly refreshes both caches
- Test with actual instants in different states (REQUESTED, INFLIGHT,
COMPLETED)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]