[ 
https://issues.apache.org/jira/browse/SPARK-44239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wan Kun updated SPARK-44239:
----------------------------
    Description: 
When spark reads a data file into a WritableColumnVector, the memory allocated 
by the WritableColumnVectors is not freed until the VectorizedColumnReader 
completes.

It will save memory allocation time by reusing the allocated array objects. But 
it also takes up too many unused memory after the current large vector batch 
has been read.

Add a memory reserve policy for this scenario which will reuse the allocated 
array object for small column vectors and free the memory for huge column 
vectors.

!image-2023-06-29-12-58-12-256.png!!image-2023-06-29-13-03-15-470.png!

 

  was:
When spark reads a data file into a WritableColumnVector, the memory allocated 
by the WritableColumnVectors is not freed until the VectorizedColumnReader 
completes.

It will save memory allocation time by reusing the allocated array objects. But 
it also takes up too many unused memory after the current large vector batch 
has been read.

Add a reserve policy for this scenario which will reuse the allocated array 
object for small column vectors and free the memory for huge column vectors.

!image-2023-06-29-12-58-12-256.png!!image-2023-06-29-13-03-15-470.png!

 


> Free memory allocated by huge column vector
> -------------------------------------------
>
>                 Key: SPARK-44239
>                 URL: https://issues.apache.org/jira/browse/SPARK-44239
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.5.0
>            Reporter: Wan Kun
>            Priority: Major
>         Attachments: image-2023-06-29-12-58-12-256.png, 
> image-2023-06-29-13-03-15-470.png
>
>
> When spark reads a data file into a WritableColumnVector, the memory 
> allocated by the WritableColumnVectors is not freed until the 
> VectorizedColumnReader completes.
> It will save memory allocation time by reusing the allocated array objects. 
> But it also takes up too many unused memory after the current large vector 
> batch has been read.
> Add a memory reserve policy for this scenario which will reuse the allocated 
> array object for small column vectors and free the memory for huge column 
> vectors.
> !image-2023-06-29-12-58-12-256.png!!image-2023-06-29-13-03-15-470.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to