Wan Kun created SPARK-44239:
-------------------------------

             Summary: Reclaim memory allocated by huge column vector
                 Key: SPARK-44239
                 URL: https://issues.apache.org/jira/browse/SPARK-44239
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.5.0
            Reporter: Wan Kun


When spark read data files into WritableColumnVectors, the memory allocated by 
the WritableColumnVectors will not be free until the VectorizedColumnReader is 
finished.

It will save the memory allocation time though reusing the allocated array 
object. But it will also occupy too many unused memory after the current large 
vector batch is already read.

 

Add a vector reserve policy for this scenario, which will use the allocated 
array object for small column vectors and free up the memory for huge column 
vectors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to