First, let's try to make the terminology abundantly clear, as I for one
have (I think) misinterpreted what has been proposed.

VfsFileSystem: A subclass of
https://github.com/apache/beam/blob/9f81fd299bd32e0d6056a7da9fa994cf74db0ed9/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystem.java

VfsIO: A replacement for
https://github.com/apache/beam/blob/1e84e49e253f8833f28f1268bec3813029f582d0/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java
written using Vfs instead of
https://github.com/apache/beam/blob/29859eb54d05b96a9db477e7bb04537510273bd2/sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileSystems.java

Between these two options, VfsFileSystem is the way to go. It will allow us
to use all our existing File sources/sinks (including all the fancy
watching/streaming support from FileIO) with any filesystem supported by
Vcs. Long-term, if VFS is good enough (and we'll be able to do direct
experiments of HadoopFileSystem vs. VfsFileSystem-on-Hadoop) we could
consider moving to VFS entirely and even removing the layer of indirection.
Vfs is a filesystem, this is the right level of abstraction to plug into.
Even if it's lacking in some respects, it may still be worth keeping in
parallel to the existing FileSystem implementations long-term if it has
significantly better coverage.

On the other hand, a re-implementation of FileIO on top of Vfs seems like a
lot of duplication of code (and ongoing maintenance cost) and will be
difficult to build on top of (e.g. the binding of TextIO to FileIO is not
dynamic like the binding of filesystems).


On Mon, Mar 5, 2018 at 1:05 PM Romain Manni-Bucau <rmannibu...@gmail.com>
wrote:

> Not backing vfs by a filesystem sounds saner so VfsIO is probably the way
> to go. It would be a FileIO concurrent and hopefully replacement on the
> mid/long term.
>
> What about doing the opposite: implementing a vfs filesystem for all the
> fs we support, potentially enrich vfs if needed? Then we can just drop beam
> abstraction from what i read.
>
> Le 5 mars 2018 20:49, "Reuven Lax" <re...@google.com> a écrit :
>
>> terminology is confusing here, since the existing FileIO is a PTransform.
>> VfsFilesystem would be a better name.
>>
>>
>> On Mon, Mar 5, 2018 at 11:46 AM Robert Bradshaw <rober...@google.com>
>> wrote:
>>
>>> On Mon, Mar 5, 2018 at 11:38 AM Reuven Lax <re...@google.com> wrote:
>>>
>>>> What about a beam Filesystem impl on top of Vfs as an alternative
>>>> short-term solution? This would allow Vfs to be used with any IO.
>>>>
>>>
>>> Yes, I think this is the VfsIO that was proposed.
>>>
>>>
>>>> On Mon, Mar 5, 2018 at 11:37 AM Robert Bradshaw <rober...@google.com>
>>>> wrote:
>>>>
>>>>>
>>>>> On Mon, Mar 5, 2018 at 11:23 AM Romain Manni-Bucau <
>>>>> rmannibu...@gmail.com> wrote:
>>>>>
>>>>>>
>>>>>> 2018-03-05 20:04 GMT+01:00 Chamikara Jayalath <chamik...@google.com>:
>>>>>>
>>>>>>> I assume you mean https://commons.apache.org/proper/commons-vfs/.
>>>>>>>
>>>>>>> I'm not sure if we considered this when we originally implemented
>>>>>>> our own file-system abstraction but based on a quick look seems like 
>>>>>>> this
>>>>>>> is Java only.
>>>>>>>
>>>>>>
>>>>>> Yes, java only
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> I think having a similar file-system abstraction for various
>>>>>>> languages is a plus point for Beam. May be we should consider a Java
>>>>>>> file-system implementation for VFS ?
>>>>>>>
>>>>>>
>>>>>> Can be an option but when I see the current complexity I'm not sure
>>>>>> mixing 2 abstractions would help, maybe just a VfsIO for java users would
>>>>>> be good enough - thinking out loud.
>>>>>>
>>>>>> What sounds clear to me is that each language will need its own
>>>>>> abstraction - which kind of join your proposal. However we can still make
>>>>>> it smooth and easy on the java side - which
>>>>>> will likely stay mainstream for still some years - using vfs as our
>>>>>> java impl instead of reimplementing the full abstraction? This way we 
>>>>>> keep
>>>>>> our *API* but we drop beam *impl* to just reuse VFS.
>>>>>>
>>>>>> PS: for gcs https://github.com/ltouati/vfs-gcs can be a good example
>>>>>> on how it can work.
>>>>>>
>>>>>
>>>>> I think a VfsIO makes a lot of sense in the short term, and will give
>>>>> use the experience needed to decide if we can move solely to VFS (for Java
>>>>> at least) for implementation, and possibly API in a future major release,
>>>>> in the long run.
>>>>>
>>>>

Reply via email to