rionmonster commented on code in PR #2265:
URL: https://github.com/apache/fluss/pull/2265#discussion_r2653465669


##########
fluss-lake/fluss-lake-iceberg/src/test/java/org/apache/fluss/lake/iceberg/maintenance/IcebergRewriteITCase.java:
##########
@@ -186,6 +186,9 @@ void testLogTableCompaction() throws Exception {
                             t1, t1Bucket, ++i, true, 
Collections.singletonList(row(1, "v1"))));
             checkFileStatusInIcebergTable(t1, 3, false);
 
+            // Ensure tiering job has fully processed the previous writes
+            assertReplicaStatus(t1Bucket, i);

Review Comment:
   @luoyuxia 
   
   I've been digging into this a bit further and it seems there's some 
disparity between the actual files being written and retrieving the latest 
offsets for those (specifically after any types of asynchronous operations, 
such as compaction, etc.). I think we may need some mechanism to improve the 
consistency, at least within the bounds of the tests.
   
   I wrote a little monitor to run through several iterations of the tests to 
see what the state of various bits looked like during each iteration. Here's a 
sample of those:
   
   **Successful Test (Successfully Performs Compaction as Expected)**
   [MONITOR] log_table_33 - TableBucket{tableId=32, bucket=0}
   | IcebergFiles | IcebergSnapshotId | LakeSnapshotId | ReplicaLakeSnapId | 
ReplicaLakeLogEndOff | Timestamp |
   
|--------------|-------------------|----------------|-------------------|----------------------|-----------|
   | 0            | -1                | -1             | -1                | -1 
                  | 1767113560800 |
   | 0            | -1                | -1             | -1                | -1 
                  | 1767113561046 |
   | ...           | ...                | ...              | ...                
| ...                   | ...                        |
   | 3            | 5182733673261799288 | 5182733673261799288 | 
5443797100773076340 | 3                    | 1767113615059 |
   | 3            | 5182733673261799288 | 5182733673261799288 | 
5443797100773076340 | 3                    | 1767113615307 |
   | 2            | 2575057976625237982 | 2575057976625237982 | 
5443797100773076340 | 4                    | 1767113615557 |
   | 2            | 2575057976625237982 | 2575057976625237982 | 
5443797100773076340 | 4                    | 1767113615808 |
   
   **Failing Test (File appeared to never be properly written before expected 
offset)**
   [MONITOR] log_table_34 - TableBucket{tableId=33, bucket=0}
   | IcebergFiles | IcebergSnapshotId | LakeSnapshotId | ReplicaLakeSnapId | 
ReplicaLakeLogEndOff | Timestamp |
   
|--------------|-------------------|----------------|-------------------|----------------------|-----------|
   | 0            | -1                | -1             | -1                | -1 
                  | 1767113616327 |
   ...           | ...                | ...              | ...                | 
...                   | ...                        | ... |
   | 2            | 7273969972093574431 | 7273969972093574431 | 
7273969972093574431 | 2                    | 1767113861627 |
   | 2            | 7273969972093574431 | 7273969972093574431 | 
7273969972093574431 | 2                    | 1767113861882 |
   | 2            | 7273969972093574431 | 7273969972093574431 | 
7273969972093574431 | 2                    | 1767113862135 |
   | 2            | 7273969972093574431 | 7273969972093574431 | 
7273969972093574431 | 2                    | 1767113862381 |
   | 2            | 7273969972093574431 | 7273969972093574431 | 
7273969972093574431 | 2                    | 1767113862633 |
   
   ```
   [ASSERTION FAILURE] Expected offset 3 but got 2 for bucket 
TableBucket{tableId=33, bucket=0}
     Replica Lake Snapshot ID: 7273969972093574431
     Current State:
       Iceberg Files: 2
       Iceberg Snapshot ID: 7273969972093574431
       Lake Snapshot ID (from admin): 7273969972093574431
       Replica Lake Snapshot ID: 7273969972093574431
       Replica Lake Log End Offset: 2
   ```
   
   I'm not sure if this is more of an artifact of the tests themselves or a 
legitimate issue. Any thoughts? Happy to continue digging. It _feels_ like a 
race-condition due to the inconsistency, either on writing the files or reading 
stale offsets from the data lake directly. I'm sure in a real-world 
environment, this might be tolerable (as I suspect it's just a minor latency 
spike which would eventually resolve), but in the confines of a test, it's 
flaky.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to