echo567 opened a new issue, #7245:
URL: https://github.com/apache/kyuubi/issues/7245

   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [x] I have searched in the 
[issues](https://github.com/apache/kyuubi/issues?q=is%3Aissue) and found no 
similar issues.
   
   
   ### Describe the bug
   
   When `kyuubi.operation.result.format=arrow`, 
`spark.connect.grpc.arrow.maxBatchSize` does not take effect.
   
   Reproduction: You can debug `KyuubiArrowConverters` or add the following log 
to line 300 of `KyuubiArrowConverters`:
   
   `logInfo(s"Total limit: ${limit}, rowCount: ${rowCount}, " +
   
   s"rowCountInLastBatch:${rowCountInLastBatch}," +
   
   s"estimatedBatchSize: ${estimatedBatchSize}," +
   
   s"maxEstimatedBatchSize: ${maxEstimatedBatchSize}," +
   
   s"maxRecordsPerBatch:${maxRecordsPerBatch}")`
   
   Test data: 1.6 million rows, 30 columns per row. Command executed:
   
   `bin/beeline -u` 
'jdbc:hive2://10.168.X.X:XX/default;thrift.client.max.message.size=2000000000' 
--hiveconf kyuubi.operation.result.format=arrow -n test -p 'testpass' 
--outputformat=csv2 -e "select * from db.table" > /tmp/test.csv `
   
   
   Log output
   25/11/13 13:52:57 INFO KyuubiArrowConverters: Total limit: -1, rowCount: 
200000, lastBatchRowCount:200000, estimatedBatchSize: 145600000 
maxEstimatedBatchSize: 4,maxRecordsPerBatch:10000 25/11/13 13:52:57 INFO 
KyuubiArrowConverters: Total limit: -1, rowCount: 200000, 
lastBatchRowCount:200000, estimatedBatchSize: 145600000
   
   Original Code
   
   while (rowIter.hasNext && (
   rowCountInLastBatch == 0 && maxEstimatedBatchSize > 0 ||
   estimatedBatchSize <= 0 ||
   estimatedBatchSize < maxEstimatedBatchSize ||
   maxRecordsPerBatch <= 0 ||
   rowCountInLastBatch < maxRecordsPerBatch ||
   rowCount < limit ||
   limit < 0))
   
   deekseek's explanation is as follows:
   
   while (rowIter.hasNext && (condition A || condition B || condition C || 
condition D || condition E || condition F))
   
   Detailed Explanation of Each Condition
   
   Condition A: rowCountInLastBatch == 0 && maxEstimatedBatchSize > 0 Special 
handling for writing the first line:
   
   rowCountInLastBatch == 0: The current batch is the first line
   
   maxEstimatedBatchSize > 0: Maximum batch size setting is valid
   
   Meaning: When a valid batch size limit is set, the first line of the current 
batch is always written.
   
   Condition B: estimatedBatchSize <= 0 Unlimited byte size: If the estimated 
batch size ≤ 0, there is no limit.
   
   Condition C: estimatedBatchSize < maxEstimatedBatchSize Byte size not 
exceeded: The current estimated size is less than the maximum allowed size.
   
   Condition D: maxRecordsPerBatch <= 0 Unlimited record count: If the maximum 
number of records per batch ≤ 0, there is no limit.
   
   Condition E: rowCountInLastBatch < maxRecordsPerBatch Record count not 
exceeded: The number of records in the current batch is less than the limit.
   
   Condition F: rowCount < limit || limit < 0 Total number of rows control:
   
   rowCount < limit: The total number of rows processed has not reached the 
limit.
   
   limit < 0: Total row count limit is negative (indicating no limit)
   
   "Continue as long as any condition is met" strategy
   
   When the limit is not set, i.e., -1, all data will be retrieved at once. If 
the row count is too large, the following three problems will occur:
   
   (1) Driver/executor oom
   
   (2) Array oom cause of array length is not enough
   
   (3) Transfer data slowly
   
   After updating the code, the log output is as follows:
   
   25/11/14 10:57:16 INFO KyuubiArrowConverters: Total limit: -1, rowCount: 
5762, rowCountInLastBatch:5762, estimatedBatchSize: 4194736, 
maxEstimatedBatchSize: 4194304, maxRecordsPerBatch:10000
   
   25/11/14 10:57:16 INFO KyuubiArrowConverters: Total limit: -1, rowCount: 
11524, rowCountInLastBatch: 5762, estimatedBatchSize: 4194736, 
maxEstimatedBatchSize: 4194304, maxRecordsPerBatch: 10000 25/11/14 10:57:16 
INFO KyuubiArrowConverters: Total limit: -1, rowCount: 17286, 
rowCountInLastBatch: 5762, estimatedBatchSize: 4194736, maxEstimatedBatchSize: 
4194304, maxRecordsPerBatch: 10000
   
   The estimatedBatchSize is slightly larger than the maxEstimatedBatchSize. 
Data can be written in batches as expected.
   
   ### Affects Version(s)
   
   master
   
   ### Kyuubi Server Log Output
   
   ```logtalk
   
   ```
   
   ### Kyuubi Engine Log Output
   
   ```logtalk
   25/11/13 13:52:57 INFO KyuubiArrowConverters: Total limit: -1, rowCount: 
200000, lastBatchRowCount:200000, estimatedBatchSize: 145600000 
maxEstimatedBatchSize: 4,maxRecordsPerBatch:10000 25/11/13 13:52:57 INFO 
KyuubiArrowConverters: Total limit: -1, rowCount: 200000, 
lastBatchRowCount:200000, estimatedBatchSize: 145600000
   ```
   
   ### Kyuubi Server Configurations
   
   ```yaml
   
   ```
   
   ### Kyuubi Engine Configurations
   
   ```yaml
   
   ```
   
   ### Additional context
   
   Test data: 1.6 million rows, 30 columns per row
   
   ### Are you willing to submit PR?
   
   - [x] Yes. I would be willing to submit a PR with guidance from the Kyuubi 
community to fix.
   - [ ] No. I cannot submit a PR at this time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to