[jira] [Commented] (FLINK-24392) Upgrade presto s3 fs implementation to Trino >= 348

Arvid Heise (Jira) Tue, 30 Nov 2021 05:33:39 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-24392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451122#comment-17451122
 ]


Arvid Heise commented on FLINK-24392:
-------------------------------------

This is blocked on dropping Java 8 on which the community has yet to decide.

> Upgrade presto s3 fs implementation to Trino >= 348
> ---------------------------------------------------
>
>                 Key: FLINK-24392
>                 URL: https://issues.apache.org/jira/browse/FLINK-24392
>             Project: Flink
>          Issue Type: Improvement
>          Components: FileSystems
>    Affects Versions: 1.14.0
>            Reporter: Robert Metzger
>            Priority: Major
>             Fix For: 1.15.0
>
>
> The Presto s3 filesystem implementation currently shipped with Flink doesn't 
> support streaming uploads. All data needs to be materialized to a single file 
> on disk, before it can be uploaded.
> This can lead to situations where TaskManagers are running out of disk when 
> creating a savepoint.
> The Hadoop filesystem implementation supports streaming uploads (by using 
> multipart uploads of smaller (say 100mb) files locally), but it does more API 
> calls, leading to other issues.
> Trino version >= 348 supports streaming uploads.
> During experiments, I also noticed that the current presto s3 fs 
> implementation seems to allocate a lot of memory outside the heap (when 
> shipping large data, for example when creating a savepoint). On a K8s pod 
> with a memory limit of 4000Mi, I was not able to run Flink with a 
> "taskmanager.memory.flink.size" above 3000m. This means that an additional 
> 1gb of memory needs to be allocated just for the peaks in memory allocation 
> when presto s3 is taking a savepoint. It would be good to confirm this 
> behavior, and then either adjust the default memory configuration or the 
> documentation.
> As part of this upgrade, we also need to make sure that the new presto / 
> Trino version is not doing substantially more S3 API calls than the current 
> version. After switching away from the presto s3 to hadoop s3, I noticed that 
> disposing an old checkpoint (~100gb) can take up to 15 minutes. The upgraded 
> presto s3 fs should still be able to quickly dispose state.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-24392) Upgrade presto s3 fs implementation to Trino >= 348

Reply via email to