[ 
https://issues.apache.org/jira/browse/SPARK-33753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-33753:
---------------------------
    Attachment: current_job_finish_time.png

> Reduce the memory footprint and gc of the cache (hadoopJobMetadata)
> -------------------------------------------------------------------
>
>                 Key: SPARK-33753
>                 URL: https://issues.apache.org/jira/browse/SPARK-33753
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.0.1
>            Reporter: dzcxzl
>            Priority: Minor
>         Attachments: current_job_finish_time.png
>
>
> HadoopRDD uses soft-reference map to cache jobconf (rdd_id -> jobconf).
>  When the number of hive partitions read by the driver is large, 
> HadoopRDD.getPartitions will create many jobconfs and add them to the cache.
>  The executor will also create a jobconf, add it to the cache, and share it 
> among exeuctors.
> The number of jobconfs in the driver cache increases the memory pressure. 
> When the driver memory configuration is not high, full gc will be frequently 
> used, and these jobconfs are hardly reused.
> For example, spark.driver.memory=2560m, the read partition is about 14,000, 
> and a jobconf 96kb.
> The following is a repair comparison, full gc decreased from 62s to 0.8s, and 
> the number of times decreased from 31 to 5. And the driver applied for less 
> memory (Old Gen 1.667G->968M).
>  
> Current:
> !image-2020-12-11-16-17-28-991.png!
> jstat -gcutil PID 2s
> !image-2020-12-11-16-08-53-656.png!
> !image-2020-12-11-16-10-07-363.png!
>  
> Try to change softValues to weakValues
> !image-2020-12-11-16-11-26-673.png!
> !image-2020-12-11-16-11-35-988.png!
> !image-2020-12-11-16-12-22-035.png!
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to