Le 5 mars 2018 22:26, "Robert Bradshaw" <[email protected]> a écrit :
First, let's try to make the terminology abundantly clear, as I for one have (I think) misinterpreted what has been proposed. VfsFileSystem: A subclass of https://github.com/apache/beam/blob/ 9f81fd299bd32e0d6056a7da9fa994cf74db0ed9/sdks/java/core/src/ main/java/org/apache/beam/sdk/io/FileSystem.java VfsIO: A replacement for https://github.com/apache/beam/blob/ 1e84e49e253f8833f28f1268bec3813029f582d0/sdks/java/core/src/ main/java/org/apache/beam/sdk/io/FileIO.java written using Vfs instead of https://github.com/apache/beam/blob/29859eb54d05b96a9db477e7bb0453 7510273bd2/sdks/java/core/src/main/java/org/apache/beam/sdk/ io/FileSystems.java Ack Between these two options, VfsFileSystem is the way to go. It will allow us to use all our existing File sources/sinks (including all the fancy watching/streaming support from FileIO) with any filesystem supported by Vcs. Long-term, if VFS is good enough (and we'll be able to do direct experiments of HadoopFileSystem vs. VfsFileSystem-on-Hadoop) we could consider moving to VFS entirely and even removing the layer of indirection. Vfs is a filesystem, this is the right level of abstraction to plug into. Even if it's lacking in some respects, it may still be worth keeping in parallel to the existing FileSystem implementations long-term if it has significantly better coverage. Ok On the other hand, a re-implementation of FileIO on top of Vfs seems like a lot of duplication of code (and ongoing maintenance cost) and will be difficult to build on top of (e.g. the binding of TextIO to FileIO is not dynamic like the binding of filesystems). Well it shouldnt. Let me clarify my view: we - as asf and not just beam - can make both project growing from that work and be more mature and interoperable with the existing ecosystem (who does impl a beam filesystem when providing a new filesystem). Interesting thing is recent java version have a filesystem absstraction too but this one is harder to make evolving for our need. High level goal is to keep it ecosystem friendly and not create yet another one. On Mon, Mar 5, 2018 at 1:05 PM Romain Manni-Bucau <[email protected]> wrote: > Not backing vfs by a filesystem sounds saner so VfsIO is probably the way > to go. It would be a FileIO concurrent and hopefully replacement on the > mid/long term. > > What about doing the opposite: implementing a vfs filesystem for all the > fs we support, potentially enrich vfs if needed? Then we can just drop beam > abstraction from what i read. > > Le 5 mars 2018 20:49, "Reuven Lax" <[email protected]> a écrit : > >> terminology is confusing here, since the existing FileIO is a PTransform. >> VfsFilesystem would be a better name. >> >> >> On Mon, Mar 5, 2018 at 11:46 AM Robert Bradshaw <[email protected]> >> wrote: >> >>> On Mon, Mar 5, 2018 at 11:38 AM Reuven Lax <[email protected]> wrote: >>> >>>> What about a beam Filesystem impl on top of Vfs as an alternative >>>> short-term solution? This would allow Vfs to be used with any IO. >>>> >>> >>> Yes, I think this is the VfsIO that was proposed. >>> >>> >>>> On Mon, Mar 5, 2018 at 11:37 AM Robert Bradshaw <[email protected]> >>>> wrote: >>>> >>>>> >>>>> On Mon, Mar 5, 2018 at 11:23 AM Romain Manni-Bucau < >>>>> [email protected]> wrote: >>>>> >>>>>> >>>>>> 2018-03-05 20:04 GMT+01:00 Chamikara Jayalath <[email protected]>: >>>>>> >>>>>>> I assume you mean https://commons.apache.org/proper/commons-vfs/. >>>>>>> >>>>>>> I'm not sure if we considered this when we originally implemented >>>>>>> our own file-system abstraction but based on a quick look seems like >>>>>>> this >>>>>>> is Java only. >>>>>>> >>>>>> >>>>>> Yes, java only >>>>>> >>>>>> >>>>>>> >>>>>>> I think having a similar file-system abstraction for various >>>>>>> languages is a plus point for Beam. May be we should consider a Java >>>>>>> file-system implementation for VFS ? >>>>>>> >>>>>> >>>>>> Can be an option but when I see the current complexity I'm not sure >>>>>> mixing 2 abstractions would help, maybe just a VfsIO for java users would >>>>>> be good enough - thinking out loud. >>>>>> >>>>>> What sounds clear to me is that each language will need its own >>>>>> abstraction - which kind of join your proposal. However we can still make >>>>>> it smooth and easy on the java side - which >>>>>> will likely stay mainstream for still some years - using vfs as our >>>>>> java impl instead of reimplementing the full abstraction? This way we >>>>>> keep >>>>>> our *API* but we drop beam *impl* to just reuse VFS. >>>>>> >>>>>> PS: for gcs https://github.com/ltouati/vfs-gcs can be a good example >>>>>> on how it can work. >>>>>> >>>>> >>>>> I think a VfsIO makes a lot of sense in the short term, and will give >>>>> use the experience needed to decide if we can move solely to VFS (for Java >>>>> at least) for implementation, and possibly API in a future major release, >>>>> in the long run. >>>>> >>>>
