Re: [PR] [spark] Support push down limit for primary key table [paimon]

via GitHub Wed, 09 Oct 2024 22:29:20 -0700


ulysses-you commented on code in PR #4299:
URL: https://github.com/apache/paimon/pull/4299#discussion_r1794680201



##########
paimon-core/src/main/java/org/apache/paimon/table/source/DataTableBatchScan.java:
##########
@@ -96,6 +96,27 @@ private StartingScanner.Result 
applyPushDownLimit(StartingScanner.Result result)
             long scannedRowCount = 0;
             SnapshotReader.Plan plan = ((ScannedResult) result).plan();
             List<DataSplit> splits = plan.dataSplits();
+            if (splits.isEmpty()) {
+                return result;
+            }
+
+            // We first add the rawConvertible split to avoid merging, and if 
the row count
+            // is still less than limit number, then add split which is not 
rawConvertible.
+            splits.sort(
+                    (x, y) -> {
+                        if (x.rawConvertible() && y.rawConvertible()) {
+                            return 0;
+                        } else if (x.rawConvertible()) {
+                            return -1;
+                        } else {
+                            return 1;
+                        }
+                    });
+            // fast return if there is no rawConvertible split

Review Comment:
   It follows previous behavior, if all splits are not rawConvertible, then we 
add all splits as the final plan. The semantics of pushDownLimit is just a best 
effort, the engine will do limit again.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@paimon.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [spark] Support push down limit for primary key table [paimon]

Reply via email to