> Maybe to take a step back - why do we want this in the Arrow 
> repositories/under Arrow governance?

I think this is the important question.  What is the goal here?

If the goal is to help spread awareness then we can link to a repo
somewhere (e.g. a "projects that use Arrow" section or something)  For
example, I could eventually see something like [1] for ADBC.

If the goal is to share some kind of CI infrastructure burden (e.g.
ensure a library runs everywhere that Arrow can run) then the contrib
repo might be more useful than a repo-per-project but I think we'll
need some more general discussion on how to make this happen.

If the goal is to share maintenance / development cost or find new
developers then I don't think any approach works.  Most Arrow
developers are quite adept at ignoring the parts of the repo they
don't need to interact with.

[1] https://jwt.io/libraries

On Fri, Oct 21, 2022 at 8:48 AM Antoine Pitrou <anto...@python.org> wrote:
>
>
> Le 21/10/2022 à 17:35, David Li a écrit :
> > Maybe to take a step back - why do we want this in the Arrow 
> > repositories/under Arrow governance?
> >
> > I'm excited to see more integrations and use cases for Flight and Flight 
> > SQL in the wild, but I think it would be good to see a true ecosystem 
> > around this, and so I don't think -every- integration needs to end up in 
> > the Arrow repos. And there is a cost to set up CI, releases, etc. (ADBC is 
> > still getting set up there, and my hope at least is that most integrations 
> > will eventually be provided by the database systems, not by Arrow.)
> >
> > That said I'm not necessarily opposed. We've discussed similar 'contrib' 
> > things in the past [1][2]. It may be worth reviewing the discussions there 
> > and discussing how this project would address the criteria proposed.
>
> The problem is that Arrow is so broad nowadays that a "contrib" repo
> would end up hosting a hodgepodge of entirely disparate subprojects with
> no common maintenance/release policies, and disjoint development and
> user communities.
>
> A separate Apache repo for each subproject is probably better, even
> though there might be a small setup overhead.
>
> Regards
>
> Antoine.
>
>
>
>
>
> >
> > [1]: https://lists.apache.org/thread/nfr3tq1tb5tvr34zg5z7on8xglfsj79t
> > [2]: https://lists.apache.org/thread/yshp4b3g34kxovzvf6x48pzj0894qbw5 
> > (though you may have to dig to find the responses - the UI didn't link them 
> > up)
> >
> > On Fri, Oct 21, 2022, at 11:08, Kyle Brooks wrote:
> >> Hi David and Antoine,
> >>
> >> Long-term I completely agree that this should belong in Apache Spark.
> >> I also agree that Flight SQL or ADBC would be a good enhancement for
> >> users.  We are planning on implementing Flight SQL support soon.  ADBC
> >> doesn't look mature enough right now for this use case.  We will keep
> >> an eye on it.
> >>
> >> Short-term, I'd like to propose either creating an Arrow contrib repo
> >> or making a separate Apache repo just for the Flight Spark Connector.
> >>
> >> We would need help facilitating this within Apache / Arrow.
> >>
> >> Thank you,
> >> Kyle
> >>
> >> On 2022/10/18 23:44:49 David Li wrote:
> >>> Given the probable need for IP clearance, getting it into Arrow would 
> >>> also be a Process(TM) unfortunately. We also don't really have a great 
> >>> place for "not quite in tree" projects; there have been discussions of a 
> >>> 'contrib' repo in the past, but nothing has materialized.
> >>>
> >>> That said - have you shown this to Spark users? I'd guess there'd be more 
> >>> enthusiasm there, especially if there are particular data source(s) you 
> >>> anticipate this would make available to them. (Though again, Flight SQL 
> >>> or ADBC over plain Flight RPC would might be a more attractive target for 
> >>> such a Spark plugin.)
> >>>
> >>> -David
> >>>
> >>> On Tue, Oct 18, 2022, at 16:50, Matt Phelps wrote:
> >>>> Hi David and Antoine,
> >>>>
> >>>> Thanks for your input. On past experience talking to some other Arrow /
> >>>> Spark developers, we anticipate that it would take a long time to get
> >>>> into Spark. Our plan was to build up a user base in the Arrow community
> >>>> before submitting for inclusion to Spark. Is there a place the code can
> >>>> live in the mean time?
> >>>>
> >>>> Matt Phelps
> >>>>
> >>>>
> >>>> From: Antoine Pitrou <an...@python.org>
> >>>> Date: Monday, October 17, 2022 at 2:48 PM
> >>>> To: dev@arrow.apache.org <de...@arrow.apache.org>
> >>>> Subject: Re: [DISCUSS] Integrate existing Spark connector for Flight
> >>>> CAUTION: This email originated from outside of the organization. Do not
> >>>> click links or open attachments unless you recognize the sender and
> >>>> know the content is safe.
> >>>>
> >>>> Le 17/10/2022 à 21:27, David Li a écrit :
> >>>>> Hey Matt,
> >>>>>
> >>>>> This is cool to see. To be clear, this is an implementation of Spark 
> >>>>> DataSourceV2 using Arrow Flight?
> >>>>>
> >>>>> I think the questions I have are:
> >>>>>
> >>>>> - Does this belong under Arrow, or under Spark - I lean towards it 
> >>>>> being closer to Spark than Arrow;
> >>>>
> >>>> FWIW, that is my feeling as well.
> >>>>
> >>>> Regards
> >>>>
> >>>> Antoine.
> >>>

Reply via email to