L. C. Hsieh created SPARK-50235:
-----------------------------------
Summary: Clean up ColumnVector resource after processing all rows
in ColumnarToRowExec
Key: SPARK-50235
URL: https://issues.apache.org/jira/browse/SPARK-50235
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.5.3, 3.4.4, 4.0.0
Reporter: L. C. Hsieh
Currently we only assign null to ColumnarBatch object but it doesn't release
the resources hold by the vectors in the batch. For OnHeapColumnVector, the
Java arrays may be automatically collected by JVM, but for OffHeapColumnVector,
the allocated off-heap memory will be leaked.
For custom ColumnVector implementations like Arrow-based, it also possibly
causes issues on memory safety if the underlying buffers are reused across
batches. Because when ColumnarToRowExec begins to fill values for next batch,
the arrays in previous batch are still hold.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]