[
https://issues.apache.org/jira/browse/FLINK-23354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated FLINK-23354:
-----------------------------------
Labels: pull-request-available (was: )
> Limit the size of blob cache on TaskExecutor
> --------------------------------------------
>
> Key: FLINK-23354
> URL: https://issues.apache.org/jira/browse/FLINK-23354
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Coordination
> Reporter: Zhilong Hong
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.14.0
>
>
> Currently a TaskExecutor uses BlobCache to cache the blobs transported from
> JobManager. The caches are the local file stored on the TaskExecutor. The
> blob cache will not be cleaned up until one hour after the related job is
> finished. At present, JobInformation and TaskInformation are transported via
> blob. If a lot of jobs are submitted, the blob cache will occupy large amount
> of disk space. In FLINK-23218, we are going to distribute the cached
> ShuffleDescriptors via blob. When large amount of failovers happen, there
> will be a lot of cache stored on local disk. In extreme cases, the blob would
> blow up the disk space.
> So we need to add a limit size for the blob cache on TaskExecutor, as
> described in the comments of FLINK-23218. The main idea is to add a size
> limit and and delete blobs in LRU order if the size limit is exceeded. Before
> a blob item is cached, TaskExecutor will firstly check the overall size of
> cache. If the overall size exceeds the limit, the blob will be deleted in LRU
> order until the limit is not exceeded anymore. For the blob cache that is
> deleted, if it is used afterwards, it will be downloaded from the blob server
> again.
> The default value of the size limit of the blob cache on TaskExecutor will be
> 10GiB.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)