LiebingYu commented on code in PR #2197:
URL: https://github.com/apache/fluss/pull/2197#discussion_r2629962804


##########
fluss-flink/fluss-flink-common/src/main/java/org/apache/fluss/flink/source/enumerator/FlinkSourceEnumerator.java:
##########
@@ -664,9 +677,16 @@ private void handlePartitionsRemoved(Collection<Partition> 
removedPartitionInfo)
         pendingSplitAssignment.forEach(
                 (reader, splits) ->
                         splits.removeIf(
-                                split ->
-                                        removedPartitionsMap.containsKey(
-                                                
split.getTableBucket().getPartitionId())));
+                                split -> {
+                                    // Never remove LakeSnapshotSplit, because 
during union reads,
+                                    // data from the lake partition must still 
be read even if the
+                                    // partition has already expired in Fluss.
+                                    if (split instanceof LakeSnapshotSplit) {

Review Comment:
   As discuss offline, LakeSnapshotAndFlussLogSplit should not be considered 
here, as it pertains to how the reader handles partition expiration during the 
reading process—an issue that should be addressed through other mechanisms, 
such as a consumer. Introducing it here would add excessive complexity. 
Furthermore, even if it were included, we still couldn't guarantee complete 
data retrieval if the partition expires mid-read, because the data in Fluss 
would still be purged by TTL.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to