FelixYBW commented on issue #6947: URL: https://github.com/apache/incubator-gluten/issues/6947#issuecomment-2495851658
The 3 configurations have big impact to the offheap and overhead memory usage: spark.gluten.sql.columnar.backend.velox.spillWriteBufferSize spark.gluten.sql.columnar.backend.velox.MaxSpillRunRows spark.gluten.sql.columnar.backend.velox.maxSpillFileSize SpillWriteBufferSize controls the buffer size when spill write data to disk. Looks it also control the read buffer size when spill data is fetch back. Each file must have one buffer allocated in offheap memory. If the size is too large, it will report OOM error triggered by getOutput. MaxSpillRunRows controls the batch size of spill. The bigger the number, the more overhead memory is allocated, because during spill all memory allocation is overhead memory. The smaller the number, the more spill files. maxSpillFileSize controls the file size of spill. The smaller the number, the more spill files. #8025 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
