wForget commented on issue #10975: URL: https://github.com/apache/incubator-gluten/issues/10975#issuecomment-3466667093
> Or can you help to identified where the large memory allocation is from? Is 1.2 OK but 1.5 failed? It also failed in 1.2. The problem appears to be caused by the hash shuffle generating a large number of small ColumnBatches. I suspect that some internal parameters of ColumnBatch are not being managed, leading to excessive memory usage. After adding `spark.gluten.sql.columnar.backend.velox.resizeBatches.shuffleOutput=true`, it works well. I added some logs in parquet writer; these are the logs without resizing ColumnBatch: ``` numRows: 2, bytes: 544, arrowBufferSize: 401, stagingRows: 357152, stagingBytes: 110820492 numRows: 2, bytes: 562, arrowBufferSize: 419, stagingRows: 357154, stagingBytes: 110821054 numRows: 2, bytes: 459, arrowBufferSize: 318, stagingRows: 357156, stagingBytes: 110821513 numRows: 3, bytes: 1110, arrowBufferSize: 549, stagingRows: 357159, stagingBytes: 110822623 numRows: 3, bytes: 1074, arrowBufferSize: 513, stagingRows: 357162, stagingBytes: 110823697 numRows: 2, bytes: 544, arrowBufferSize: 401, stagingRows: 357164, stagingBytes: 110824241 numRows: 3, bytes: 1157, arrowBufferSize: 594, stagingRows: 357167, stagingBytes: 110825398 numRows: 3, bytes: 1098, arrowBufferSize: 537, stagingRows: 357170, stagingBytes: 110826496 numRows: 2, bytes: 556, arrowBufferSize: 413, stagingRows: 357172, stagingBytes: 110827052 numRows: 2, bytes: 491, arrowBufferSize: 350, stagingRows: 357174, stagingBytes: 110827543 numRows: 2, bytes: 460, arrowBufferSize: 319, stagingRows: 357176, stagingBytes: 110828003 numRows: 2, bytes: 564, arrowBufferSize: 421, stagingRows: 357178, stagingBytes: 110828567 numRows: 2, bytes: 465, arrowBufferSize: 324, stagingRows: 357180, stagingBytes: 110829032 numRows: 3, bytes: 1127, arrowBufferSize: 566, stagingRows: 357183, stagingBytes: 110830159 numRows: 2, bytes: 557, arrowBufferSize: 414, stagingRows: 357185, stagingBytes: 110830716 numRows: 2, bytes: 514, arrowBufferSize: 373, stagingRows: 357187, stagingBytes: 110831230 numRows: 2, bytes: 463, arrowBufferSize: 322, stagingRows: 357189, stagingBytes: 110831693 numRows: 2, bytes: 544, arrowBufferSize: 401, stagingRows: 357191, stagingBytes: 110832237 numRows: 3, bytes: 1117, arrowBufferSize: 556, stagingRows: 357194, stagingBytes: 110833354 numRows: 2, bytes: 549, arrowBufferSize: 406, stagingRows: 357196, stagingBytes: 110833903 numRows: 2, bytes: 511, arrowBufferSize: 370, stagingRows: 357198, stagingBytes: 110834414 numRows: 2, bytes: 541, arrowBufferSize: 398, stagingRows: 357200, stagingBytes: 110834955 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
