[
https://issues.apache.org/jira/browse/SPARK-44239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wan Kun updated SPARK-44239:
----------------------------
Description:
When spark reads a data file into a WritableColumnVector, the memory allocated
by the WritableColumnVectors is not freed until the VectorizedColumnReader
completes.
It will save memory allocation time by reusing the allocated array objects. But
it also takes up too many unused memory after the current large vector batch
has been read.
!image-2023-06-29-12-58-12-256.png!
Add vector reserve policy for this scenario which will reuse the allocated
array object for small column vectors and free the memory for huge column
vectors.
was:
When spark read data files into WritableColumnVectors, the memory allocated by
the WritableColumnVectors will not be free until the VectorizedColumnReader is
finished.
It will save the memory allocation time though reusing the allocated array
object. But it will also occupy too many unused memory after the current large
vector batch is already read.
Add a vector reserve policy for this scenario, which will use the allocated
array object for small column vectors and free up the memory for huge column
vectors.
> Reclaim memory allocated by huge column vector
> ----------------------------------------------
>
> Key: SPARK-44239
> URL: https://issues.apache.org/jira/browse/SPARK-44239
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.5.0
> Reporter: Wan Kun
> Priority: Major
> Attachments: image-2023-06-29-12-58-12-256.png
>
>
> When spark reads a data file into a WritableColumnVector, the memory
> allocated by the WritableColumnVectors is not freed until the
> VectorizedColumnReader completes.
> It will save memory allocation time by reusing the allocated array objects.
> But it also takes up too many unused memory after the current large vector
> batch has been read.
> !image-2023-06-29-12-58-12-256.png!
> Add vector reserve policy for this scenario which will reuse the allocated
> array object for small column vectors and free the memory for huge column
> vectors.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]