On Fri, Jul 30, 2021 at 4:58 AM Antoine Pitrou <anto...@python.org> wrote:
>
>
> Le 29/07/2021 à 23:01, Weston Pace a écrit :
> > In reviewing the RADOS PR (which I think is very cool) I am running
> > into some interesting questions that might be good to flesh out here.
> >
> > The first of which is related to the scope of the Github repo.  For
> > context the RADOS PR introduces a Ceph object class (a plugin for
> > CephFS, a cloud based file system) called Skyhook which is a
> > standalone artifact that depends on Arrow and is installed into CephFS
> > servers.  An argument could be made that such an artifact does not
> > belong in the Arrow repo since it could conceivably be hosted in its
> > own repository.
> >
> > On the other hand, the current description for the repo is "Apache
> > Arrow is a multi-language toolbox for accelerated data interchange and
> > in-memory processing".  This extension doesn't necessarily have a home
> > elsewhere (i.e. I don't think Ceph hosts object classes) and it is
> > needed by the datasets module (the topic of a later email) so I think
> > it could be considered a tool.  Also, there is some precedent with
> > tools like crossbow, plasma or extensions with 3rd party libaries such
> > as pandas, orc, etc.
> >
> > So noodling on this I would think a good starting point for criteria
> > to be eligible for the Git repo is:
> >
> >   * It doesn't have a good home elsewhere
> >   * The authors are willing to have it Apache licensed and be subject
> > to Apache Arrow's ownership
> >   * There are integration tests ensuring the tool is functioning
> >   * Someone is maintaining the tool and the integration tests
> >   * One of:
> >      * The tool integrates Arrow with a 3rd party library
> >      * The tool is used by Arrow (e.g. crossbow)
>
> I think these criteria must also include an evaluation of the
> maintenance and packaging burden, and a real commitment from the
> original authors to participate in in-tree maintenance (with an emphasis
> on *in-tree*, because in other contexts I've seen people integrate a
> sizable contribution into a large open source project, only to continue
> maintaining it in the original repo and disregard the social dynamics of
> the "large open source project").

I do agree with this. We don't want people to "throw code over the
wall". I see this as a strategy for community development / growth.

> Intuitively, I'd say adding a Ceph integration layer in the Arrow repo
> pushes maintenance and expertise requirements beyond the capabilities of
> our current team.  But I may mistaken.

You could be right, but it can also serve as a test-run for these
principles. If it doesn't work out, or it's causing issues, we aren't
committing ourselves to maintaining it without the involvement of the
new code authors.

> Regards
>
> Antoine.

Reply via email to