codope commented on code in PR #18055:
URL: https://github.com/apache/hudi/pull/18055#discussion_r2749676544


##########
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java:
##########
@@ -559,11 +560,30 @@ public StorageConfiguration<?> getStorageConf() {
    */
   public synchronized HoodieActiveTimeline getActiveTimeline() {
     if (activeTimeline == null) {
-      activeTimeline = 
tableFormat.getTimelineFactory().createActiveTimeline(this);
+      if (rawActiveTimeline == null) {
+        rawActiveTimeline = getRawActiveTimeline();
+      }
+      List<HoodieInstant> activeInstants = 
TimelineLayout.fromVersion(getTimelineLayoutVersion())
+              .filterHoodieInstants(rawActiveTimeline.getInstants().stream())
+              .sorted()
+              .collect(Collectors.toList());
+      activeTimeline = 
tableFormat.getTimelineFactory().createActiveTimeline(this, activeInstants);

Review Comment:
   This modifies `getActiveTimeline()` to derive from `rawActiveTimeline`. But 
`reloadActiveTimeline()` creates `activeTimeline` directly via 
`createActiveTimeline(this)`, which performs its own filesystem scan. This 
means:
   - `getActiveTimeline()` --> derives from `rawActiveTimeline`
   - `reloadActiveTimeline()` --> scans filesystem independently
   
   This is inconsistent and could lead to subtle bugs where the two timelines 
diverge.



##########
hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java:
##########


Review Comment:
   Should we reset `rawActiveTimeline` here to avoid stale cache?



##########
hudi-common/src/main/java/org/apache/hudi/common/table/timeline/TimelineFactory.java:
##########
@@ -42,4 +43,6 @@ public abstract class TimelineFactory implements Serializable 
{
   public abstract HoodieActiveTimeline 
createActiveTimeline(HoodieTableMetaClient metaClient, boolean 
applyLayoutFilter);
 
   public abstract CompletionTimeQueryView 
createCompletionTimeQueryView(HoodieTableMetaClient metaClient);
+
+  public abstract HoodieActiveTimeline 
createActiveTimeline(HoodieTableMetaClient metaClient, List<HoodieInstant> 
instants);

Review Comment:
   Please add javadoc explaining:
   - Purpose of this method
   - That it creates a timeline from pre-loaded instants (no filesystem scan)
   - When to use this vs `createActiveTimeline(metaClient)` or 
`createActiveTimeline(metaClient, applyLayoutFilter)`



##########
hudi-hadoop-common/src/test/java/org/apache/hudi/common/table/TestHoodieTableMetaClient.java:
##########
@@ -378,4 +379,31 @@ void testReadIndexDefFromStorage() throws Exception {
       }
     }, "Should throw HoodieIOException for invalid JSON");
   }
+
+  @Test
+  void testGetRawActiveTimeline() {

Review Comment:
   thanks for adding this test. I think we should cover more scenarios:
   - Verify that `rawActiveTimeline` contains all instants (including 
intermediate states)
   - Verify that `activeTimeline` contains filtered instants (only latest state 
per action)
   - Verify that filtering is correctly applied
   - Verify that `reloadActiveTimeline()` properly refreshes both caches
   - Test with actual instants in different states (REQUESTED, INFLIGHT, 
COMPLETED)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to