[
https://issues.apache.org/jira/browse/FLINK-23354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhu Zhu reassigned FLINK-23354:
-------------------------------
Assignee: Zhilong Hong
> Limit the size of ShuffleDescriptors in PermanentBlobCache on TaskExecutor
> --------------------------------------------------------------------------
>
> Key: FLINK-23354
> URL: https://issues.apache.org/jira/browse/FLINK-23354
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Coordination
> Reporter: Zhilong Hong
> Assignee: Zhilong Hong
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.14.0
>
>
> _This is the part 3 of the optimization related to task deployments. For more
> details about the overall description and the part 1, please see FLINK-23005.
> For more details about the part 2 please see FLINK-23218._
> Currently a TaskExecutor uses BlobCache to cache the blobs transported from
> JobManager. The caches are the local file stored on the TaskExecutor. The
> blob cache will not be cleaned up until one hour after the related job is
> finished. In FLINK-23218, we are going to distribute the cached
> ShuffleDescriptors via blob. When large amount of failovers happen, there
> will be a lot of cache stored on local disk. The blob cache will occupy large
> amount of disk space. In extreme cases, the blob would blow up the disk space.
> So we need to add a limit size for the ShuffleDescriptors stored in
> PermanentBlobCache on TaskExecutor, as described in the comments of
> FLINK-23218. The main idea is to add a size limit and and delete the blobs in
> LRU order if the size limit is exceeded. Before a blob item is cached,
> TaskExecutor will firstly check the overall size of cache. If the overall
> size exceeds the limit, the blob will be deleted in LRU order until the limit
> is not exceeded anymore. For the blob cache that is deleted, if it is used
> afterwards, it will be downloaded from the HA or the blob server again.
> The default value of the size limit for the ShuffleDescriptors in
> PermanentBlobCache on TaskExecutor will be 100 MiB.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)