Maybe to take a step back - why do we want this in the Arrow repositories/under 
Arrow governance?

I'm excited to see more integrations and use cases for Flight and Flight SQL in 
the wild, but I think it would be good to see a true ecosystem around this, and 
so I don't think -every- integration needs to end up in the Arrow repos. And 
there is a cost to set up CI, releases, etc. (ADBC is still getting set up 
there, and my hope at least is that most integrations will eventually be 
provided by the database systems, not by Arrow.)

That said I'm not necessarily opposed. We've discussed similar 'contrib' things 
in the past [1][2]. It may be worth reviewing the discussions there and 
discussing how this project would address the criteria proposed.

[1]: https://lists.apache.org/thread/nfr3tq1tb5tvr34zg5z7on8xglfsj79t
[2]: https://lists.apache.org/thread/yshp4b3g34kxovzvf6x48pzj0894qbw5 (though 
you may have to dig to find the responses - the UI didn't link them up)

On Fri, Oct 21, 2022, at 11:08, Kyle Brooks wrote:
> Hi David and Antoine,
>
> Long-term I completely agree that this should belong in Apache Spark.  
> I also agree that Flight SQL or ADBC would be a good enhancement for 
> users.  We are planning on implementing Flight SQL support soon.  ADBC 
> doesn't look mature enough right now for this use case.  We will keep 
> an eye on it.
>
> Short-term, I'd like to propose either creating an Arrow contrib repo 
> or making a separate Apache repo just for the Flight Spark Connector.
>
> We would need help facilitating this within Apache / Arrow.
>
> Thank you,
> Kyle
>
> On 2022/10/18 23:44:49 David Li wrote:
>> Given the probable need for IP clearance, getting it into Arrow would also 
>> be a Process(TM) unfortunately. We also don't really have a great place for 
>> "not quite in tree" projects; there have been discussions of a 'contrib' 
>> repo in the past, but nothing has materialized.
>> 
>> That said - have you shown this to Spark users? I'd guess there'd be more 
>> enthusiasm there, especially if there are particular data source(s) you 
>> anticipate this would make available to them. (Though again, Flight SQL or 
>> ADBC over plain Flight RPC would might be a more attractive target for such 
>> a Spark plugin.)
>> 
>> -David
>> 
>> On Tue, Oct 18, 2022, at 16:50, Matt Phelps wrote:
>> > Hi David and Antoine,
>> >
>> > Thanks for your input. On past experience talking to some other Arrow / 
>> > Spark developers, we anticipate that it would take a long time to get 
>> > into Spark. Our plan was to build up a user base in the Arrow community 
>> > before submitting for inclusion to Spark. Is there a place the code can 
>> > live in the mean time?
>> >
>> > Matt Phelps
>> >
>> >
>> > From: Antoine Pitrou <[email protected]>
>> > Date: Monday, October 17, 2022 at 2:48 PM
>> > To: [email protected] <[email protected]>
>> > Subject: Re: [DISCUSS] Integrate existing Spark connector for Flight
>> > CAUTION: This email originated from outside of the organization. Do not 
>> > click links or open attachments unless you recognize the sender and 
>> > know the content is safe.
>> >
>> > Le 17/10/2022 à 21:27, David Li a écrit :
>> >> Hey Matt,
>> >>
>> >> This is cool to see. To be clear, this is an implementation of Spark 
>> >> DataSourceV2 using Arrow Flight?
>> >>
>> >> I think the questions I have are:
>> >>
>> >> - Does this belong under Arrow, or under Spark - I lean towards it being 
>> >> closer to Spark than Arrow;
>> >
>> > FWIW, that is my feeling as well.
>> >
>> > Regards
>> >
>> > Antoine.
>>

Reply via email to