[jira] [Updated] (FLINK-24392) Upgrade presto s3 fs implementation to Trino >= 348

Robert Metzger (Jira) Tue, 28 Sep 2021 05:46:22 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-24392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Robert Metzger updated FLINK-24392:
-----------------------------------
    Description: 
The Presto s3 filesystem implementation currently shipped with Flink doesn't 
support streaming uploads. All data needs to be materialized to a single file 
on disk, before it can be uploaded.
This can lead to situations where TaskManagers are running out of disk when 
creating a savepoint.

The Hadoop filesystem implementation supports streaming uploads (by using 
multipart uploads of smaller (say 100mb) files locally), but it does more API 
calls, leading to other issues.

Trino version >= 348 supports streaming uploads.

During experiments, I also noticed that the current presto s3 fs implementation 
seems to allocate a lot of memory outside the heap (when shipping large data, 
for example when creating a savepoint). On a K8s pod with a memory limit of 
4000Mi, I was not able to run Flink with a "taskmanager.memory.flink.size" 
above 3000m. This means that an additional 1gb of memory needs to be allocated 
just for the peaks in memory allocation when presto s3 is taking a savepoint.
As part of this upgrade, we also need to make sure that the new presto / Trino 
version is not doing substantially more S3 API calls than the current version. 
After switching away from the presto s3 to hadoop s3, I noticed that disposing 
an old checkpoint (~100gb) can take up to 15 minutes. The upgraded presto s3 fs 
should still be able to quickly dispose state.

  was:
The Presto s3 filesystem implementation currently shipped with Flink doesn't 
support streaming uploads. All data needs to be materialized to a single file 
on disk, before it can be uploaded.
This can lead to situations where TaskManagers are running out of disk when 
creating a savepoint.

The Hadoop filesystem implementation supports streaming uploads (by using 
multipart uploads of smaller (say 100mb) files locally), but it does more API 
calls, leading to other issues.

Trino 348 supports streaming uploads.


> Upgrade presto s3 fs implementation to Trino >= 348
> ---------------------------------------------------
>
>                 Key: FLINK-24392
>                 URL: https://issues.apache.org/jira/browse/FLINK-24392
>             Project: Flink
>          Issue Type: Improvement
>          Components: FileSystems
>    Affects Versions: 1.14.0
>            Reporter: Robert Metzger
>            Priority: Major
>             Fix For: 1.15.0
>
>
> The Presto s3 filesystem implementation currently shipped with Flink doesn't 
> support streaming uploads. All data needs to be materialized to a single file 
> on disk, before it can be uploaded.
> This can lead to situations where TaskManagers are running out of disk when 
> creating a savepoint.
> The Hadoop filesystem implementation supports streaming uploads (by using 
> multipart uploads of smaller (say 100mb) files locally), but it does more API 
> calls, leading to other issues.
> Trino version >= 348 supports streaming uploads.
> During experiments, I also noticed that the current presto s3 fs 
> implementation seems to allocate a lot of memory outside the heap (when 
> shipping large data, for example when creating a savepoint). On a K8s pod 
> with a memory limit of 4000Mi, I was not able to run Flink with a 
> "taskmanager.memory.flink.size" above 3000m. This means that an additional 
> 1gb of memory needs to be allocated just for the peaks in memory allocation 
> when presto s3 is taking a savepoint.
> As part of this upgrade, we also need to make sure that the new presto / 
> Trino version is not doing substantially more S3 API calls than the current 
> version. After switching away from the presto s3 to hadoop s3, I noticed that 
> disposing an old checkpoint (~100gb) can take up to 15 minutes. The upgraded 
> presto s3 fs should still be able to quickly dispose state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (FLINK-24392) Upgrade presto s3 fs implementation to Trino >= 348

Reply via email to