jorgecarleitao commented on issue #1532:
URL: 
https://github.com/apache/arrow-datafusion/issues/1532#issuecomment-1009372093


   Thank you for considering using arrow2, very excited about this!
   
   To provide some selling points, the primary goals of the repo have been:
   * be a place to innovate both on Arrow and on Rust
   * be sound and use the least amount of `unsafe`, pass MIRI checks, have a 
curated selection of dependencies
   * be panic-free on untrusted input to be usable in the context of web servers
   * be idiomatic via iterators, `std::Vec`, [`MutableArray` 
API](https://github.com/jorgecarleitao/arrow2/blob/main/src/array/primitive/mutable.rs#L17),
 `Scalar` API, [easy to follow 
structs](https://github.com/jorgecarleitao/arrow2/blob/main/src/array/primitive/mod.rs#L34),
 etc.
   * be performant via simd, trusted len, and fast implementations
   * support `sync` and `async` IO with APIs that decouple blocking from 
non-blocking tasks
   * be interoperable with other formats such as Arrow, Avro, Parquet and ORC, 
including mandatory integration tests against corresponding reference 
implementations
   * be modular / easy to compile via [feature flags over almost all 
functionality](https://github.com/jorgecarleitao/arrow2/blob/main/Cargo.toml#L104)
   * support WASM
   * be maintainable via macros to reduce code duplication, [user 
guide](https://jorgecarleitao.github.io/arrow2/), 
[examples](https://github.com/jorgecarleitao/arrow2/tree/main/examples) and 
general avoidance of `unsafe`
   
   atm it is the fastest implementation of Apache Parquet IO and Apache Avro IO 
that I can find (both read and write), both supporting `sync` and `async` 
executions and implemented in `safe` Rust (all IO in the crate is 
`unsafe`-free).
   
   The crate is under active development, both in volume (~800 commits in a 
year), and also exploring different ideas, such as
   * [switched to `std::Vec`](https://github.com/jorgecarleitao/arrow2/pull/693)
   * [Removed `dict_id` and `dict_is_ordered` from 
`Field`](https://github.com/jorgecarleitao/arrow2/pull/713)
   * [Replaced `RecordBatch` by 
`Chunk`](https://github.com/jorgecarleitao/arrow2/pull/717)
   * [investigate switching to portable 
simd](https://github.com/jorgecarleitao/arrow2/issues/580)
   * [investigate switching to a safer implementation of 
flatbuffers](https://github.com/jorgecarleitao/arrow2/issues/725)
   * [investigate enable copy on write 
semantics](https://github.com/jorgecarleitao/arrow2/issues/741)
   * [investigate mutable arrays in 
compute](https://github.com/jorgecarleitao/arrow2/issues/627)
   
   (which is a major reason why it is 0.X, to allow space to try things out)
   
   The crate has been adopted by [Polars](https://github.com/pola-rs/polars), 
[databend](https://github.com/datafuselabs/databend), [grafana's SDK for 
Rust](https://crates.io/crates/grafana-plugin-sdk) and is interoperable with 
[connectorx](https://github.com/sfu-db/connector-x).
   
   Releases have been happening about once a month (breaking), and on demand 
for bug fixes. The next is planned for end of this week.
   
   I hope this offers a general idea of what is the crate and where it is 
heading.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to