Wan Kun created SPARK-44239:
-------------------------------
Summary: Reclaim memory allocated by huge column vector
Key: SPARK-44239
URL: https://issues.apache.org/jira/browse/SPARK-44239
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.5.0
Reporter: Wan Kun
When spark read data files into WritableColumnVectors, the memory allocated by
the WritableColumnVectors will not be free until the VectorizedColumnReader is
finished.
It will save the memory allocation time though reusing the allocated array
object. But it will also occupy too many unused memory after the current large
vector batch is already read.
Add a vector reserve policy for this scenario, which will use the allocated
array object for small column vectors and free up the memory for huge column
vectors.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]