alamb opened a new issue #1176: URL: https://github.com/apache/arrow-rs/issues/1176
TLDR: please comment on this ticket if you have opinions about if and/or how the community should unite its efforts on a single Rust implementation of Apache Arrow. There is active [discussion](https://github.com/apache/arrow-datafusion/issues/1532) and a PR https://github.com/apache/arrow-datafusion/pull/1556 about switching the DataFusion project to use the [arrow2](https://github.com/jorgecarleitao/arrow2 ) Rust implementation of Arrow from @jorgecarleitao. While this DataFusion PR is not yet ready to merge, if DataFusion *were* to switch to `arrow2`, that leaves a question of what will happen with this (`arrow-rs`) code. Since many of the PRs, contributors and maintainers of this (arrow-rs) crate are part of the DataFusion community, I believe if DataFusion switches to `arrow2`, much of the maintenance and extension efforts would follow `arrow2` `arrow2`is largely developed by @jorgecarleitao, who is an Apache Arrow PMC member and committer, but the project itself has not been under the Apache Software Foundation’s governance. Additional background can be found on the [mailing list archives ](https://lists.apache.org/[email protected]:2021-4:arrow) and past mailing list threads such as [this](https://lists.apache.org/thread/dfkpszn3rhhz669g0sbmfcrlxv0nsho1) and [this](https://lists.apache.org/thread/cx5kr7rhy25o9mb5hcjlncndvjvvkybj ) It is my opinion that the Rust / Arrow / DataFusion community has general consensus on: 1. Having one implementation of Arrow in Rust where we can focus would be better than 2 which split attention and resources 2. The technical underpinnings of `arrow2` are more ergonomic It is not clear to me if there is a consensus on: 1. How important the Apache Governance model is (please lend your opinions here!) 2. How important the stability of APIs / the specific versioning scheme (`0.x` vs `1.x` or later) Possible ideas for a way forward: 1. Switch datafusion to `arrow2`, making no changes to `arrow-rs`. It could be maintained by anyone who wished to contribute, 2. Bring `arrow2` code into the arrow-rs repo, with appropriate IP clearance and adopt that as the officially maintained arrow implementation (*) 3. Start more actively porting the more ergonomic parts of arrow2 into arrow-rs to reduce the feature gap as suggested in https://github.com/apache/arrow-datafusion/issues/1532#issuecomment-1012985001 by @tustvold 4. Others? Option 2 leaves open the question of “how does arrow2 development move forward” – where would patches be sent, for example? I would hope we can find a way that is compatible with Apache governance, but I don't think we have a specific proposal yet, and it also depends in large part on what @jorgecarleitao is comfortable with So, for any users of this crate not also in the DataFusion community, what are your hopes / needs / plans from this crate? How important is the apache governance to you? Please tell us your thoughts! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
