Hi,

Thanks for the proposal Yun, I think that's a good idea and it could
solve the issue you mentioned (FLINK-26590) in many cases (though not all,
depending on deletion speed; but in practice it may be enough).

Having a separate interface (BulkDeletingFileSystem) would probably help in
incremental implementation of the feature (i.e. FS by FS, rather than all
at once). Although the same can be achieved by adding supportsBulkDelete().

Regarding BulkFileDeleter, I think it's required in some form, because
grouping must be done before calling FS.delete(), even if it accepts a
collection.

Have you considered limiting the batch sizes for deletions?
For example, S3 has a limit of 1000 [1], but the SDK handles it
automatically, IIUC.
If we don't rely on this handling, and implement our own, the batches could
be also deleted in parallel. This can be an initial step, from which all
the file systems would benefit, even those without bulk-delete support.

[1]
https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html

Regards,
Roman


On Thu, Jun 30, 2022 at 5:10 PM Piotr Nowojski <pnowoj...@apache.org> wrote:

> Hi,
>
> Yes, I know that you can not use recursive deletes for
> incremental checkpoints and I didn't suggest it anywhere. I just pointed
> out that I would expect multi/bulk deletes to supersede the recursive
> deletes feature assuming good underlying implementation.
> Also I'm not surprised that multi deletes can be faster. I would
> expect/hope for that. I've just raised a point that they don't have to be.
> It depends on the underlying file system. However in contrast to the
> recursive deletes, with multi deletes I wouldn't expect multi delete to be
> potentially slower.
>
> Re the Dawid's PoC. I'm not sure/I don't remember why he proposed
> `BulkDeletingFileSystem` over adding a default method to the FileSystem
> interface. But it seems to me like a minor point. The majority of Dawid's
> PR is about `BulkFileDeleter` interface, not `BulkDeletingFileSystem`, so
> about how to use the bulk deletes inside Flink, not how to implement it on
> the FileSystem side. Do you maybe have a concrete design proposal for this
> feature?
>
> Best,
> Piotrek
>
> czw., 30 cze 2022 o 15:12 Yun Tang <myas...@live.com> napisał(a):
>
> > Hi Piotr,
> >
> > As I said in the original email, you cannot delete folders recursively
> for
> > incremental checkpoints. And If you take a close look at the original
> > email, I have shared the experimental results, which proved 29x
> improvement:
> > "A simple experiment shows that deleting 1000 objects with each 5MB size,
> > will cost 39494ms with for-loop single delete operations, and the result
> > will drop to 1347ms if using multi-delete API in Tencent Cloud."
> >
> > I think I can leverage some ideas from Dawid's work. And as I said, I
> > would introduce the multi-delete API to the original FileSystem class
> > instead of introducing another BulkDeletingFileSystem, which makes the
> file
> > system abstraction closer to the modern cloud-based environment.
> >
> > Best
> > Yun Tang
> > ________________________________
> > From: Piotr Nowojski <pnowoj...@apache.org>
> > Sent: Thursday, June 30, 2022 18:25
> > To: dev <dev@flink.apache.org>; Dawid Wysakowicz <dwysakow...@apache.org
> >
> > Subject: Re: [DISCUSS] Introduce multi delete API to Flink's FileSystem
> > class
> >
> > Hi,
> >
> > I presume this would mostly supersede the recursive deletes [1]? I
> remember
> > an argument that the recursive deletes were not obviously better, even if
> > the underlying FS was supporting it. I'm not saying that this would have
> > been a counter argument against this effort, since every FileSystem could
> > decide on its own whether to use the multi delete call or not. But I
> think
> > at the very least it should be benchmarked/compared whether implementing
> it
> > for a particular FS makes sense or not.
> >
> > Also there seems to be some similar (abandoned?) effort from Dawid, with
> > named bulk deletes, with "BulkDeletingFileSystem"? [2] Isn't this
> basically
> > the same thing that you are proposing Yun Tang?
> >
> > Best,
> > Piotrek
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-13856
> > [2]
> >
> >
> https://issues.apache.org/jira/browse/FLINK-13856?focusedCommentId=17481712&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17481712
> >
> > czw., 30 cze 2022 o 11:45 Zakelly Lan <zakelly....@gmail.com>
> napisał(a):
> >
> > > Hi Yun,
> > >
> > > Thanks for bringing this into discussion.
> > > I'm +1 to this idea.
> > > And IIUC, Flink implements the OSS and S3 filesystem based on the
> hadoop
> > > filesystem interface, which does not provide the multi-delete API, it
> may
> > > take some effort to implement this.
> > >
> > > Best,
> > > Zakelly
> > >
> > > On Thu, Jun 30, 2022 at 5:36 PM Martijn Visser <
> martijnvis...@apache.org
> > >
> > > wrote:
> > >
> > > > Hi Yun Tang,
> > > >
> > > > +1 for addressing this problem and your approach.
> > > >
> > > > Best regards,
> > > >
> > > > Martijn
> > > >
> > > > Op do 30 jun. 2022 om 11:12 schreef Feifan Wang <zoltar9...@163.com
> >:
> > > >
> > > > > Thanks a lot for the proposal  @Yun Tang ! It sounds great and I
> > can't
> > > > > find any reason not to make this improvement.
> > > > >
> > > > >
> > > > > ——————————————
> > > > > Name: Feifan Wang
> > > > > Email: zoltar9...@163.com
> > > > >
> > > > >
> > > > > ---- Replied Message ----
> > > > > | From | Yun Tang<myas...@live.com> |
> > > > > | Date | 06/30/2022 16:56 |
> > > > > | To | dev@flink.apache.org<dev@flink.apache.org> |
> > > > > | Subject | [DISCUSS] Introduce multi delete API to Flink's
> > FileSystem
> > > > > class |
> > > > > Hi guys,
> > > > >
> > > > > As more and more teams move to cloud-based environments. Cloud
> object
> > > > > storage has become the factual technical standard for big data
> > > > ecosystems.
> > > > > From our experience, the performance of writing/deleting objects in
> > > > object
> > > > > storage could vary in each call, the FLIP of changelog
> state-backend
> > > had
> > > > > ever taken experiments to verify the performance of writing the
> same
> > > data
> > > > > with multi times [1], and it proves that p999 latency could be 8x
> > than
> > > > p50
> > > > > latency. This is also true for delete operations.
> > > > >
> > > > > Currently, after introducing the checkpoint backpressure
> > mechanism[2],
> > > > the
> > > > > newly triggered checkpoint could be delayed due to not cleaning
> > > > checkpoints
> > > > > as fast as possible [3].
> > > > > Moreover, Flink's checkpoint cleanup mechanism cannot leverage
> > deleting
> > > > > folder API to speed up the procedure with incremental
> checkpoints[4].
> > > > > This is extremely obvious in cloud object storage, and all most all
> > > > object
> > > > > storage SDKs have multi-delete API to accelerate the performance,
> > e.g.
> > > > AWS
> > > > > S3 [5], Aliyun OSS [6], and Tencentyun COS [7].
> > > > > A simple experiment shows that deleting 1000 objects with each 5MB
> > > size,
> > > > > will cost 39494ms with for-loop single delete operations, and the
> > > result
> > > > > will drop to 1347ms if using multi-delete API in Tencent Cloud.
> > > > >
> > > > > However, Flink's FileSystem API refers to the HDFS's FileSystem API
> > and
> > > > > lacks such a multi-delete API, which is somehow outdated currently
> in
> > > > > cloud-based environments.
> > > > > Thus I suggest adding such a multi-delete API to Flink's
> > FileSystem[8]
> > > > > class and file systems that do not support such a multi-delete
> > feature
> > > > will
> > > > > roll back to a for-loop single delete.
> > > > > By doing so, we can at least accelerate the speed of discarding
> > > > > checkpoints in cloud environments.
> > > > >
> > > > > WDYT?
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints#FLIP158:Generalizedincrementalcheckpoints-DFSwritelatency
> > > > > [2] https://issues.apache.org/jira/browse/FLINK-17073
> > > > > [3] https://issues.apache.org/jira/browse/FLINK-26590
> > > > > [4]
> > > > >
> > > >
> > >
> >
> https://github.com/apache/flink/blob/1486fee1acd9cd1e340f6d2007f723abd20294e5/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CompletedCheckpoint.java#L315
> > > > > [5]
> > > > >
> > > >
> > >
> >
> https://docs.aws.amazon.com/AmazonS3/latest/userguide/delete-multiple-objects.html
> > > > > [6]
> > > > >
> > > >
> > >
> >
> https://www.alibabacloud.com/help/en/object-storage-service/latest/delete-objects-8#section-v6n-zym-tax
> > > > > [7]
> > > > >
> > > >
> > >
> >
> https://intl.cloud.tencent.com/document/product/436/44018#delete-objects-in-batch
> > > > > [8]
> > > > >
> > > >
> > >
> >
> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/fs/FileSystem.java
> > > > >
> > > > >
> > > > > Best
> > > > > Yun Tang
> > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to