On Fri, Jul 30, 2021 at 4:58 AM Antoine Pitrou <anto...@python.org> wrote: > > > Le 29/07/2021 à 23:01, Weston Pace a écrit : > > In reviewing the RADOS PR (which I think is very cool) I am running > > into some interesting questions that might be good to flesh out here. > > > > The first of which is related to the scope of the Github repo. For > > context the RADOS PR introduces a Ceph object class (a plugin for > > CephFS, a cloud based file system) called Skyhook which is a > > standalone artifact that depends on Arrow and is installed into CephFS > > servers. An argument could be made that such an artifact does not > > belong in the Arrow repo since it could conceivably be hosted in its > > own repository. > > > > On the other hand, the current description for the repo is "Apache > > Arrow is a multi-language toolbox for accelerated data interchange and > > in-memory processing". This extension doesn't necessarily have a home > > elsewhere (i.e. I don't think Ceph hosts object classes) and it is > > needed by the datasets module (the topic of a later email) so I think > > it could be considered a tool. Also, there is some precedent with > > tools like crossbow, plasma or extensions with 3rd party libaries such > > as pandas, orc, etc. > > > > So noodling on this I would think a good starting point for criteria > > to be eligible for the Git repo is: > > > > * It doesn't have a good home elsewhere > > * The authors are willing to have it Apache licensed and be subject > > to Apache Arrow's ownership > > * There are integration tests ensuring the tool is functioning > > * Someone is maintaining the tool and the integration tests > > * One of: > > * The tool integrates Arrow with a 3rd party library > > * The tool is used by Arrow (e.g. crossbow) > > I think these criteria must also include an evaluation of the > maintenance and packaging burden, and a real commitment from the > original authors to participate in in-tree maintenance (with an emphasis > on *in-tree*, because in other contexts I've seen people integrate a > sizable contribution into a large open source project, only to continue > maintaining it in the original repo and disregard the social dynamics of > the "large open source project").
I do agree with this. We don't want people to "throw code over the wall". I see this as a strategy for community development / growth. > Intuitively, I'd say adding a Ceph integration layer in the Arrow repo > pushes maintenance and expertise requirements beyond the capabilities of > our current team. But I may mistaken. You could be right, but it can also serve as a test-run for these principles. If it doesn't work out, or it's causing issues, we aren't committing ourselves to maintaining it without the involvement of the new code authors. > Regards > > Antoine.