echo567 opened a new issue, #7245: URL: https://github.com/apache/kyuubi/issues/7245
### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) ### Search before asking - [x] I have searched in the [issues](https://github.com/apache/kyuubi/issues?q=is%3Aissue) and found no similar issues. ### Describe the bug When `kyuubi.operation.result.format=arrow`, `spark.connect.grpc.arrow.maxBatchSize` does not take effect. Reproduction: You can debug `KyuubiArrowConverters` or add the following log to line 300 of `KyuubiArrowConverters`: `logInfo(s"Total limit: ${limit}, rowCount: ${rowCount}, " + s"rowCountInLastBatch:${rowCountInLastBatch}," + s"estimatedBatchSize: ${estimatedBatchSize}," + s"maxEstimatedBatchSize: ${maxEstimatedBatchSize}," + s"maxRecordsPerBatch:${maxRecordsPerBatch}")` Test data: 1.6 million rows, 30 columns per row. Command executed: `bin/beeline -u` 'jdbc:hive2://10.168.X.X:XX/default;thrift.client.max.message.size=2000000000' --hiveconf kyuubi.operation.result.format=arrow -n test -p 'testpass' --outputformat=csv2 -e "select * from db.table" > /tmp/test.csv ` Log output 25/11/13 13:52:57 INFO KyuubiArrowConverters: Total limit: -1, rowCount: 200000, lastBatchRowCount:200000, estimatedBatchSize: 145600000 maxEstimatedBatchSize: 4,maxRecordsPerBatch:10000 25/11/13 13:52:57 INFO KyuubiArrowConverters: Total limit: -1, rowCount: 200000, lastBatchRowCount:200000, estimatedBatchSize: 145600000 Original Code while (rowIter.hasNext && ( rowCountInLastBatch == 0 && maxEstimatedBatchSize > 0 || estimatedBatchSize <= 0 || estimatedBatchSize < maxEstimatedBatchSize || maxRecordsPerBatch <= 0 || rowCountInLastBatch < maxRecordsPerBatch || rowCount < limit || limit < 0)) deekseek's explanation is as follows: while (rowIter.hasNext && (condition A || condition B || condition C || condition D || condition E || condition F)) Detailed Explanation of Each Condition Condition A: rowCountInLastBatch == 0 && maxEstimatedBatchSize > 0 Special handling for writing the first line: rowCountInLastBatch == 0: The current batch is the first line maxEstimatedBatchSize > 0: Maximum batch size setting is valid Meaning: When a valid batch size limit is set, the first line of the current batch is always written. Condition B: estimatedBatchSize <= 0 Unlimited byte size: If the estimated batch size ≤ 0, there is no limit. Condition C: estimatedBatchSize < maxEstimatedBatchSize Byte size not exceeded: The current estimated size is less than the maximum allowed size. Condition D: maxRecordsPerBatch <= 0 Unlimited record count: If the maximum number of records per batch ≤ 0, there is no limit. Condition E: rowCountInLastBatch < maxRecordsPerBatch Record count not exceeded: The number of records in the current batch is less than the limit. Condition F: rowCount < limit || limit < 0 Total number of rows control: rowCount < limit: The total number of rows processed has not reached the limit. limit < 0: Total row count limit is negative (indicating no limit) "Continue as long as any condition is met" strategy When the limit is not set, i.e., -1, all data will be retrieved at once. If the row count is too large, the following three problems will occur: (1) Driver/executor oom (2) Array oom cause of array length is not enough (3) Transfer data slowly After updating the code, the log output is as follows: 25/11/14 10:57:16 INFO KyuubiArrowConverters: Total limit: -1, rowCount: 5762, rowCountInLastBatch:5762, estimatedBatchSize: 4194736, maxEstimatedBatchSize: 4194304, maxRecordsPerBatch:10000 25/11/14 10:57:16 INFO KyuubiArrowConverters: Total limit: -1, rowCount: 11524, rowCountInLastBatch: 5762, estimatedBatchSize: 4194736, maxEstimatedBatchSize: 4194304, maxRecordsPerBatch: 10000 25/11/14 10:57:16 INFO KyuubiArrowConverters: Total limit: -1, rowCount: 17286, rowCountInLastBatch: 5762, estimatedBatchSize: 4194736, maxEstimatedBatchSize: 4194304, maxRecordsPerBatch: 10000 The estimatedBatchSize is slightly larger than the maxEstimatedBatchSize. Data can be written in batches as expected. ### Affects Version(s) master ### Kyuubi Server Log Output ```logtalk ``` ### Kyuubi Engine Log Output ```logtalk 25/11/13 13:52:57 INFO KyuubiArrowConverters: Total limit: -1, rowCount: 200000, lastBatchRowCount:200000, estimatedBatchSize: 145600000 maxEstimatedBatchSize: 4,maxRecordsPerBatch:10000 25/11/13 13:52:57 INFO KyuubiArrowConverters: Total limit: -1, rowCount: 200000, lastBatchRowCount:200000, estimatedBatchSize: 145600000 ``` ### Kyuubi Server Configurations ```yaml ``` ### Kyuubi Engine Configurations ```yaml ``` ### Additional context Test data: 1.6 million rows, 30 columns per row ### Are you willing to submit PR? - [x] Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix. - [ ] No. I cannot submit a PR at this time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
