Hi Brian, I think the main goal would be to make a python package that could be pip installed independently of apache_beam. That goal could be accomplished with option 3, thus preserving all of the benefits of a monorepo. If it gains enough popularity and contributors outside of the Beam community, then options 1 and 2 could be considered to make it easier to foster a new community of contributors.
Beam has a lot of great tech in it, and it makes me think of Celery, which is a much older python project of a similar ilk that spawned a series of useful independent projects: kombu [1], an AMQP messaging library, and billiard [2], a multiprocessing library. Obviously, there are a number of pros and cons to consider. The cons are pretty clear: even within a monorepo it will make the Beam build more complicated. The pros are a bit more abstract. The fileIO project could appeal to a broader audience, and act as a signpost for Beam (on PyPI, etc), thereby increasing awareness of Beam amongst the types of cloud-friendly python developers who would need the fileIO package. -chad [1] https://github.com/celery/kombu [2] https://github.com/celery/billiard On Thu, May 20, 2021 at 7:57 AM Brian Hulette <bhule...@google.com> wrote: > That's an interesting idea. What do you mean by its own project? A couple > of possibilities: > - Spinning off a new ASF project > - A separate Beam-governed repository (e.g. apache/beam-filesystems) > - More clearly separate it in the current build system and release > artifacts that allow it to be used independently > > Personally I'd be resistant to the first two (I am a Google engineer and I > like monorepos after all), but I don't see a major problem with the last > one, except that it gives us another surface to maintain. > > Brian > > On Wed, May 19, 2021 at 8:38 PM Chad Dombrova <chad...@gmail.com> wrote: > >> This is a random idea, but the whole file IO system inside Beam would >> actually be awesome to extract into its own project. IIRC, it’s not >> particularly tied to Beam. >> >> I’m not saying this should be done now, but it’s be nice to keep it mind >> for a future goal. >> >> -chad >> >> >> >> On Wed, May 19, 2021 at 10:23 AM Pablo Estrada <pabl...@google.com> >> wrote: >> >>> That would be great to add, Matt. Of course it's important to make this >>> backwards compatible, but other than that, the addition would be very >>> welcome. >>> >>> On Wed, May 19, 2021 at 9:41 AM Matt Rudary <matt.rud...@twosigma.com> >>> wrote: >>> >>>> Hi, >>>> >>>> >>>> >>>> This is a quick sketch of a proposal – I wanted to get a sense of >>>> whether there’s general support for this idea before fleshing it out >>>> further, getting internal approvals, etc. >>>> >>>> >>>> >>>> I’m working with multiple storage systems that speak the S3 api. I >>>> would like to support FileIO operations for these storage systems, but >>>> S3FileSystem hardcodes the s3 scheme (the various systems use different URI >>>> schemes) and it is in any case impossible to instantiate more than one in >>>> the current design. >>>> >>>> >>>> >>>> I’d like to refactor the code in org.apache.beam.sdk.io.aws.s3 (and >>>> maybe …aws.options) somewhat to enable this use-case. I haven’t worked out >>>> the details yet, but it will take some thought to make this work in a >>>> non-hacky way. >>>> >>>> >>>> >>>> Thanks >>>> >>>> Matt Rudary >>>> >>>