wwj6591812 commented on code in PR #5715:
URL: https://github.com/apache/paimon/pull/5715#discussion_r2139414488


##########
paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/source/operator/ReadOperator.java:
##########
@@ -130,6 +137,14 @@ public void processElement(StreamRecord<Split> record) 
throws Exception {
                     reuseRecord.replace(nestedProjectedRowData);
                 }
                 output.collect(reuseRecord);
+
+                if (limit != null && numRecordsIn.getCount() == limit) {
+                    LOG.info(
+                            "When limit is not null, reader {} reach the limit 
record {}.",
+                            this,
+                            limit);
+                    return;

Review Comment:
   @yunfengzhou-hub 
   Hi, thanks for your reply.
   
   1、The reason why I add this condition here is :
   The reader should end as soon as possible after reading the limit data, it 
should not read unnecessary data.
   
   2、The limit of FlinkSourceBuilder#createReadBuilder is used in 
DataTableBatchScan#applyPushDownLimit which try it best to cut splits.
   
   3、In testBatchReadSourceWithSnapshot, remove this logic from ReadOperator, 
the `SELECT ... LIMIT 2` syntax still functioned correctly, the reason is:
   Flink plan will generate a global limit operator, it will limit 2.
   ![Uploading image.png…]()
   



##########
paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/source/operator/ReadOperator.java:
##########
@@ -130,6 +137,14 @@ public void processElement(StreamRecord<Split> record) 
throws Exception {
                     reuseRecord.replace(nestedProjectedRowData);
                 }
                 output.collect(reuseRecord);
+
+                if (limit != null && numRecordsIn.getCount() == limit) {
+                    LOG.info(
+                            "When limit is not null, reader {} reach the limit 
record {}.",
+                            this,
+                            limit);
+                    return;

Review Comment:
   @yunfengzhou-hub 
   Hi, thanks for your reply.
   
   1、The reason why I add this condition here is :
   The reader should end as soon as possible after reading the limit data, it 
should not read unnecessary data.
   
   2、The limit of FlinkSourceBuilder#createReadBuilder is used in 
DataTableBatchScan#applyPushDownLimit which try it best to cut splits.
   
   3、In testBatchReadSourceWithSnapshot, remove this logic from ReadOperator, 
the `SELECT ... LIMIT 2` syntax still functioned correctly, the reason is:
   Flink plan will generate a global limit operator, it will limit 2.
   
![image](https://github.com/user-attachments/assets/0ff6a3ce-ab3b-40c7-b9ea-47c36e088da4)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@paimon.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to