Re: [PR] [fix](maxcompute) Estimate write block size from Arrow buffers, not per-row serialization [doris]

via GitHub Tue, 23 Jun 2026 01:03:58 -0700


github-actions[bot] commented on code in PR #64612:
URL: https://github.com/apache/doris/pull/64612#discussion_r3458025412



##########
fe/be-java-extensions/max-compute-connector/src/main/java/org/apache/doris/maxcompute/MaxComputeJniWriter.java:
##########
@@ -272,79 +269,110 @@ private void rotateCurrentBatchWriter() throws 
IOException {
         openBatchWriter(requestBlockId());
     }
 
-    private void writeRowsWithRowChecks(VectorTable inputTable, int numRows, 
int numCols) throws IOException {
+    private void writeBatch(VectorTable inputTable, int numRows, int numCols) 
throws IOException {
         int rowStart = 0;
         while (rowStart < numRows) {

Review Comment:
   The split path still builds an Arrow root for the entire remaining JNI block 
before it knows whether the block fits. `rowEnd` is set to `numRows` here and 
`buildRowRangeRoot(inputTable, numCols, rowStart, rowEnd)` allocates/fills 
vectors for every remaining row before `estimateBatchPayloadBytes` can fall 
through to `findPartialRowRange`. Since `VJniFormatTransformer.write()` passes 
the whole BE block to Java in one `VectorTable`, a large incoming block that 
should be split by `mc.write_max_block_bytes` still has to be copied into one 
full-size `VectorSchemaRoot` first, so this fix can still hit the same heap 
pressure it is trying to avoid. Please choose the bounded row range before 
materializing Arrow vectors, and add a `writeBatch`-level test/injection that 
fails if an oversized input first requests `rowStart..numRows`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [fix](maxcompute) Estimate write block size from Arrow buffers, not per-row serialization [doris]

Reply via email to