alamb opened a new issue #1176:
URL: https://github.com/apache/arrow-rs/issues/1176


   TLDR: please comment on this ticket if you have opinions about if and/or how 
the community should unite its efforts on a single Rust implementation of 
Apache Arrow. 
   
   There is active 
[discussion](https://github.com/apache/arrow-datafusion/issues/1532) and a PR 
https://github.com/apache/arrow-datafusion/pull/1556 about switching the 
DataFusion project to use the [arrow2](https://github.com/jorgecarleitao/arrow2 
) Rust implementation of Arrow from @jorgecarleitao. While this DataFusion PR 
is not yet ready to merge, if DataFusion *were* to switch to `arrow2`, that 
leaves a question of what will happen with this (`arrow-rs`) code. 
   
   Since many of the PRs, contributors and maintainers of this (arrow-rs) crate 
are part of the DataFusion community, I believe if DataFusion switches to 
`arrow2`, much of the maintenance and extension efforts would follow `arrow2`
   
   `arrow2`is largely developed by @jorgecarleitao, who is an Apache Arrow PMC 
member and committer, but the project itself has not been under the Apache 
Software Foundation’s governance. Additional background can be found on the 
[mailing list archives 
](https://lists.apache.org/[email protected]:2021-4:arrow) and past 
mailing list threads such as 
[this](https://lists.apache.org/thread/dfkpszn3rhhz669g0sbmfcrlxv0nsho1) and 
[this](https://lists.apache.org/thread/cx5kr7rhy25o9mb5hcjlncndvjvvkybj )
   
   It is my opinion that the Rust / Arrow / DataFusion community has general 
consensus on:
   1. Having one implementation of Arrow in Rust where we can focus would be 
better than 2 which split attention and resources
   2. The technical underpinnings of `arrow2` are more ergonomic
   
   It is not clear to me if there is a consensus on:
   1. How important the Apache Governance model is (please lend your opinions 
here!)
   2. How important the stability of APIs / the specific versioning scheme 
(`0.x` vs `1.x` or later)  
   
   Possible ideas for a way forward:
   1. Switch datafusion to `arrow2`, making no changes to `arrow-rs`. It could 
be maintained by anyone who wished to contribute, 
   2. Bring `arrow2` code into the arrow-rs repo, with appropriate IP clearance 
and adopt that as the officially maintained arrow implementation (*)
   3. Start more actively porting the more ergonomic parts of arrow2 into 
arrow-rs to reduce the feature gap as suggested in 
https://github.com/apache/arrow-datafusion/issues/1532#issuecomment-1012985001  
by @tustvold 
   4. Others?
   
   Option 2 leaves open the question of “how does arrow2 development move 
forward” – where would patches be sent, for example? I would hope we can find a 
way that is compatible with Apache governance, but I don't think we have a 
specific proposal yet, and it also depends in large part on what 
@jorgecarleitao  is comfortable with
   
   So, for any users of this crate not also in the DataFusion community, what 
are your hopes / needs / plans from this crate? How important is the apache 
governance to you? Please tell us your thoughts!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to