> Maybe to take a step back - why do we want this in the Arrow > repositories/under Arrow governance?
I think this is the important question. What is the goal here? If the goal is to help spread awareness then we can link to a repo somewhere (e.g. a "projects that use Arrow" section or something) For example, I could eventually see something like [1] for ADBC. If the goal is to share some kind of CI infrastructure burden (e.g. ensure a library runs everywhere that Arrow can run) then the contrib repo might be more useful than a repo-per-project but I think we'll need some more general discussion on how to make this happen. If the goal is to share maintenance / development cost or find new developers then I don't think any approach works. Most Arrow developers are quite adept at ignoring the parts of the repo they don't need to interact with. [1] https://jwt.io/libraries On Fri, Oct 21, 2022 at 8:48 AM Antoine Pitrou <anto...@python.org> wrote: > > > Le 21/10/2022 à 17:35, David Li a écrit : > > Maybe to take a step back - why do we want this in the Arrow > > repositories/under Arrow governance? > > > > I'm excited to see more integrations and use cases for Flight and Flight > > SQL in the wild, but I think it would be good to see a true ecosystem > > around this, and so I don't think -every- integration needs to end up in > > the Arrow repos. And there is a cost to set up CI, releases, etc. (ADBC is > > still getting set up there, and my hope at least is that most integrations > > will eventually be provided by the database systems, not by Arrow.) > > > > That said I'm not necessarily opposed. We've discussed similar 'contrib' > > things in the past [1][2]. It may be worth reviewing the discussions there > > and discussing how this project would address the criteria proposed. > > The problem is that Arrow is so broad nowadays that a "contrib" repo > would end up hosting a hodgepodge of entirely disparate subprojects with > no common maintenance/release policies, and disjoint development and > user communities. > > A separate Apache repo for each subproject is probably better, even > though there might be a small setup overhead. > > Regards > > Antoine. > > > > > > > > > [1]: https://lists.apache.org/thread/nfr3tq1tb5tvr34zg5z7on8xglfsj79t > > [2]: https://lists.apache.org/thread/yshp4b3g34kxovzvf6x48pzj0894qbw5 > > (though you may have to dig to find the responses - the UI didn't link them > > up) > > > > On Fri, Oct 21, 2022, at 11:08, Kyle Brooks wrote: > >> Hi David and Antoine, > >> > >> Long-term I completely agree that this should belong in Apache Spark. > >> I also agree that Flight SQL or ADBC would be a good enhancement for > >> users. We are planning on implementing Flight SQL support soon. ADBC > >> doesn't look mature enough right now for this use case. We will keep > >> an eye on it. > >> > >> Short-term, I'd like to propose either creating an Arrow contrib repo > >> or making a separate Apache repo just for the Flight Spark Connector. > >> > >> We would need help facilitating this within Apache / Arrow. > >> > >> Thank you, > >> Kyle > >> > >> On 2022/10/18 23:44:49 David Li wrote: > >>> Given the probable need for IP clearance, getting it into Arrow would > >>> also be a Process(TM) unfortunately. We also don't really have a great > >>> place for "not quite in tree" projects; there have been discussions of a > >>> 'contrib' repo in the past, but nothing has materialized. > >>> > >>> That said - have you shown this to Spark users? I'd guess there'd be more > >>> enthusiasm there, especially if there are particular data source(s) you > >>> anticipate this would make available to them. (Though again, Flight SQL > >>> or ADBC over plain Flight RPC would might be a more attractive target for > >>> such a Spark plugin.) > >>> > >>> -David > >>> > >>> On Tue, Oct 18, 2022, at 16:50, Matt Phelps wrote: > >>>> Hi David and Antoine, > >>>> > >>>> Thanks for your input. On past experience talking to some other Arrow / > >>>> Spark developers, we anticipate that it would take a long time to get > >>>> into Spark. Our plan was to build up a user base in the Arrow community > >>>> before submitting for inclusion to Spark. Is there a place the code can > >>>> live in the mean time? > >>>> > >>>> Matt Phelps > >>>> > >>>> > >>>> From: Antoine Pitrou <an...@python.org> > >>>> Date: Monday, October 17, 2022 at 2:48 PM > >>>> To: dev@arrow.apache.org <de...@arrow.apache.org> > >>>> Subject: Re: [DISCUSS] Integrate existing Spark connector for Flight > >>>> CAUTION: This email originated from outside of the organization. Do not > >>>> click links or open attachments unless you recognize the sender and > >>>> know the content is safe. > >>>> > >>>> Le 17/10/2022 à 21:27, David Li a écrit : > >>>>> Hey Matt, > >>>>> > >>>>> This is cool to see. To be clear, this is an implementation of Spark > >>>>> DataSourceV2 using Arrow Flight? > >>>>> > >>>>> I think the questions I have are: > >>>>> > >>>>> - Does this belong under Arrow, or under Spark - I lean towards it > >>>>> being closer to Spark than Arrow; > >>>> > >>>> FWIW, that is my feeling as well. > >>>> > >>>> Regards > >>>> > >>>> Antoine. > >>>