jorgecarleitao commented on issue #1176:
URL: https://github.com/apache/arrow-rs/issues/1176#issuecomment-1430883886

   > I would just like to get away from this situation where we have two 
concurrent projects. [...]
   
   I agree. I agree that the situation is not productive. I am sorry that I 
caused frustration to people here.
   
   > Whilst I do not like the idea of porting stuff across, and yes it would be 
an annoying use of time, I am willing to contribute to such an effort if it 
sees an end to this situation.
   
   I am also willing to contribute to such an effort.
   
   What do you think about something to the effect of:
   
   * Arrow2 is donated to Apache Arrow and its development ceases in 
jorgecarleitao/arrow2
   * The core of arrow2 (`array/`, `bitmap/`, `offsets.rs`, `types/`) are 
lifted to a crate living in this repo (e.g. `arrow-core` or something).
   * the core receives relevant methods from arrow-rs; add methods existing in 
arrow-rs with "deprecated" to give time for arrow-rs users to use them.
   * arrow-rs' FFI of arrow2 is moved to a separate crate and replaces 
arrow-rs' one
   * Arrow-rs' compute is migrated to use `arrow-core`
   * Arrow-rs' IO except IPC is migrated to use `arrow-core`
   * Arrow-rs' IPC IO is replaced by arrow2's implementation with necessary 
adjustments
   * arrow-core will add `RunEndArray` (this is missing there atm)
   * `RecordBatch` (arrow-rs) and `Chunk` (arrow2) co-exist to give room for 
both communities to use (in core or something else)
   * Development and governance follows Apache and this repo of 
community-driven development.
   
   This could result in the following changes to arrow-rs:
   * ArrayData is removed
   * It becomes interoperable with `Vec` but no longer aligned with cache lines.
   * IPC support is improved with e.g. unsafe free, big endian support, mmap of 
IPC files
   * There is code churn related to `from` vs `from_slice` (we can switch to 
arrow-rs names)
   * slicing of arrays become easier (handling of offsets)
   
   It would also end the arrow-arrow2 split e.g. removing the un-productive 
discussions around "which is better", and combine development efforts.
   
   Some challenges:
   * Arrow2 uses `Box<dyn Array>` as children to allow easy mutation; arrow-rs 
uses `Arc<dyn Array>`
   * Arrow2 does not have `TimestampArray` nor `DecimalArray`, and instead 
sticks to the physical types only
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to