rionmonster commented on code in PR #2265:
URL: https://github.com/apache/fluss/pull/2265#discussion_r2678744002
##########
fluss-lake/fluss-lake-iceberg/src/test/java/org/apache/fluss/lake/iceberg/maintenance/IcebergRewriteITCase.java:
##########
@@ -186,6 +186,9 @@ void testLogTableCompaction() throws Exception {
t1, t1Bucket, ++i, true,
Collections.singletonList(row(1, "v1"))));
checkFileStatusInIcebergTable(t1, 3, false);
+ // Ensure tiering job has fully processed the previous writes
+ assertReplicaStatus(t1Bucket, i);
Review Comment:
@luoyuxia
Based on our private conversations exploring this, I'm including the
following based on
Okay — I was able to reproduce the failure successfully with some extended
debugging logs added throughout the `LakeTableTieringManager`. I've [created a
gist
here](https://gist.github.com/rionmonster/6b25bf7def8ac39f2cd6d112ccc02b6c)
with the logs. Reviewing over them, it looks like we have the following chain
of events:
- At `411079` we have a series of these "polled tableId=x but tablePath is
null (state=null, epoch=null)" type of calls, which repeats 10+ times
- After this burst we can see that the test fails (since rewrite/compaction
didn't complete in time)
I think we can interpret this as:
- Tiering service requested work (via `requestTable()`)
- Manager pulled table ids from pendingTieringTables but one of the
following was true:
- `tablePaths` didn't contain the id
- `tieringStates` didn't contain the id
- `tableTierEpoch` didn't contain the id
- This means that the table ids were no longer registered (e.g., previous
dropped/removed) but were still present in the `pendingTieringTables`
- This causes any `requestTable()` calls to continually drain/recurse/loop
over these stale entries which could delay processing of actual pending tables
- This delay could be enough to cause the failed assertion
I think the proposed fixes would help with this, although we may need to
consider adjusting the `removeLakeTable()` call as well to ensure we remove the
requested table from the pending tables as well, which I'll add to the PR:
```
public void removeLakeTable(long tableId) {
inLock(lock, () -> {
// Omitted for brevity
pendingTieringTables.remove(tableId);
});
}
```
I've created a branch with these additional logs if you would like to
explore it yourself at
https://github.com/rionmonster/fluss/tree/for-yuxia-with-logs
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]