L. C. Hsieh created SPARK-50235:
-----------------------------------

             Summary: Clean up ColumnVector resource after processing all rows 
in ColumnarToRowExec
                 Key: SPARK-50235
                 URL: https://issues.apache.org/jira/browse/SPARK-50235
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.5.3, 3.4.4, 4.0.0
            Reporter: L. C. Hsieh


Currently we only assign null to ColumnarBatch object but it doesn't release 
the resources hold by the vectors in the batch. For OnHeapColumnVector, the 
Java arrays may be automatically collected by JVM, but for OffHeapColumnVector, 
the allocated off-heap memory will be leaked.

For custom ColumnVector implementations like Arrow-based, it also possibly 
causes issues on memory safety if the underlying buffers are reused across 
batches. Because when ColumnarToRowExec begins to fill values for next batch, 
the arrays in previous batch are still hold.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to