The design[1] by the original author did define some rename/delete
semantics.
I do believe that the original APIs were very strict and threw errors in
many situations and had knobs that would require the user of the API to
handle more "edge" cases. I filed BEAM-5425[2] because I believe we can
make our API significantly easier to not have to be so strict. We expect
that our system will need to be resilient in the case of failure and I
believe rename/delete semantics should make it easier to write such code.
For example, if you want to rename a set of files you can't just do:
// retry up to three times
for (int i = 0; i < 3; i++) {
try {
filesystem.rename(srcs, dests, some set of options);
return;
} catch (failure) {
}
}
since subsequent calls will fail if any of the files were renamed. In most
cases the user will need to check to see what was renamed and then handle
fixing up based upon how the rename failed.
It would be much easier if we expected that delete(files)/rename(srcs,
dests) didn't need any flags and could be called with the same lists over
and over again and each partial success would make it such that subsequent
calls made progress. I guess that the intent of having a FileSystems[3]
class with static helpers for this was meant to address this. Unfortunately
I don't have enough time to pay close attention to this space and the
underlying reasons for why we made such choices in the past is fleeting.
cc @[email protected], since he has also had some interest in this space
in the past.
1:
https://docs.google.com/document/d/11TdPyZ9_zmjokhNWM3Id-XJsVG3qel2lhdKTknmZ_7M/edit#
2: https://issues.apache.org/jira/browse/BEAM-5425
3:
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java
On Wed, Sep 19, 2018 at 8:53 AM Ismaël Mejía <[email protected]>
wrote:
> Interesting question, it is ideal to define and document a clear default
> behavior for rename in all Beam filesystems (since there are no options
> allowed in the API).
> HDFS users probably will expect that the default rename behavior does NOT
> overwrite (as HDFS works), and also because this implies possible data
> loss, but I am not sure if there is a strong reason for other Filesystems
> to do overwrite by default (e.g. Local).
> cc @lukecwik <https://github.com/lukecwik> too for eventual extra
> feedback since the original authors of Beam FileSystems are not in the
> project anymore.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/beam/pull/6289#issuecomment-422856503>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AJnK7AbywAnJ4M-2Gffuaj5lNpWMPh4pks5ucmh-gaJpZM4WQHrB>
> .
>
[ Full content available at: https://github.com/apache/beam/pull/6289 ]
This message was relayed via gitbox.apache.org for [email protected]