[
https://issues.apache.org/jira/browse/FLINK-24392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451122#comment-17451122
]
Arvid Heise commented on FLINK-24392:
-------------------------------------
This is blocked on dropping Java 8 on which the community has yet to decide.
> Upgrade presto s3 fs implementation to Trino >= 348
> ---------------------------------------------------
>
> Key: FLINK-24392
> URL: https://issues.apache.org/jira/browse/FLINK-24392
> Project: Flink
> Issue Type: Improvement
> Components: FileSystems
> Affects Versions: 1.14.0
> Reporter: Robert Metzger
> Priority: Major
> Fix For: 1.15.0
>
>
> The Presto s3 filesystem implementation currently shipped with Flink doesn't
> support streaming uploads. All data needs to be materialized to a single file
> on disk, before it can be uploaded.
> This can lead to situations where TaskManagers are running out of disk when
> creating a savepoint.
> The Hadoop filesystem implementation supports streaming uploads (by using
> multipart uploads of smaller (say 100mb) files locally), but it does more API
> calls, leading to other issues.
> Trino version >= 348 supports streaming uploads.
> During experiments, I also noticed that the current presto s3 fs
> implementation seems to allocate a lot of memory outside the heap (when
> shipping large data, for example when creating a savepoint). On a K8s pod
> with a memory limit of 4000Mi, I was not able to run Flink with a
> "taskmanager.memory.flink.size" above 3000m. This means that an additional
> 1gb of memory needs to be allocated just for the peaks in memory allocation
> when presto s3 is taking a savepoint. It would be good to confirm this
> behavior, and then either adjust the default memory configuration or the
> documentation.
> As part of this upgrade, we also need to make sure that the new presto /
> Trino version is not doing substantially more S3 API calls than the current
> version. After switching away from the presto s3 to hadoop s3, I noticed that
> disposing an old checkpoint (~100gb) can take up to 15 minutes. The upgraded
> presto s3 fs should still be able to quickly dispose state.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)