Re: [PR] [KYUUBI #7245] Fix arrow batch converter error [kyuubi]

via GitHub Wed, 24 Dec 2025 06:24:26 -0800


Copilot commented on code in PR #7246:
URL: https://github.com/apache/kyuubi/pull/7246#discussion_r2645790522



##########
externals/kyuubi-spark-sql-engine/src/main/scala/org/apache/spark/sql/execution/arrow/KyuubiArrowConverters.scala:
##########
@@ -274,19 +274,28 @@ object KyuubiArrowConverters extends SQLConfHelper with 
Logging {
       var estimatedBatchSize = 0L
       Utils.tryWithSafeFinally {
 
+        def isBatchSizeLimitExceeded: Boolean = {
+          // If `maxEstimatedBatchSize` is zero or negative, it implies 
unlimited.
+          maxEstimatedBatchSize > 0 && estimatedBatchSize >= 
maxEstimatedBatchSize
+        }
+        def isRecordLimitExceeded: Boolean = {
+          // If `maxRecordsPerBatch` is zero or negative, it implies unlimited.
+          maxRecordsPerBatch > 0 && rowCountInLastBatch >= maxRecordsPerBatch
+        }
+        def isGlobalLimitNotReached: Boolean = {
+          // If the limit is negative, it means no restriction
+          // or the current number of rows has not reached the limit.
+          rowCount < limit || limit < 0
+        }
+
         // Always write the first row.
-        while (rowIter.hasNext && (
-            // For maxBatchSize and maxRecordsPerBatch, respect whatever 
smaller.
+        while (rowIter.hasNext && isGlobalLimitNotReached && (
             // If the size in bytes is positive (set properly), always write 
the first row.
-            rowCountInLastBatch == 0 && maxEstimatedBatchSize > 0 ||
-              // If the size in bytes of rows are 0 or negative, unlimit it.
-              estimatedBatchSize <= 0 ||
-              estimatedBatchSize < maxEstimatedBatchSize ||
-              // If the size of rows are 0 or negative, unlimit it.
-              maxRecordsPerBatch <= 0 ||
-              rowCountInLastBatch < maxRecordsPerBatch ||
-              rowCount < limit ||
-              limit < 0)) {
+            rowCountInLastBatch == 0 ||
+              // If either limit is hit, create a batch. This implies that the 
limit that is hit
+              // first triggers the creation of a batch even if the other 
limit is not yet hit
+              // hence preferring the more restrictive limit.

Review Comment:
   The comment states "If either limit is hit, create a batch" but the logic 
actually continues the loop when neither limit is exceeded. The comment should 
be clarified to say "Continue adding rows to the batch until either limit is 
exceeded" or similar wording to accurately reflect the loop continuation 
condition rather than batch creation trigger.
   ```suggestion
                 // Continue adding rows to the current batch until either 
limit is exceeded.
                 // The limit that is reached first determines the batch 
boundary, even if the
                 // other limit has not yet been reached, thus preferring the 
more restrictive limit.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [KYUUBI #7245] Fix arrow batch converter error [kyuubi]

Reply via email to