jorgecarleitao commented on issue #1532: URL: https://github.com/apache/arrow-datafusion/issues/1532#issuecomment-1009372093
Thank you for considering using arrow2, very excited about this! To provide some selling points, the primary goals of the repo have been: * be a place to innovate both on Arrow and on Rust * be sound and use the least amount of `unsafe`, pass MIRI checks, have a curated selection of dependencies * be panic-free on untrusted input to be usable in the context of web servers * be idiomatic via iterators, `std::Vec`, [`MutableArray` API](https://github.com/jorgecarleitao/arrow2/blob/main/src/array/primitive/mutable.rs#L17), `Scalar` API, [easy to follow structs](https://github.com/jorgecarleitao/arrow2/blob/main/src/array/primitive/mod.rs#L34), etc. * be performant via simd, trusted len, and fast implementations * support `sync` and `async` IO with APIs that decouple blocking from non-blocking tasks * be interoperable with other formats such as Arrow, Avro, Parquet and ORC, including mandatory integration tests against corresponding reference implementations * be modular / easy to compile via [feature flags over almost all functionality](https://github.com/jorgecarleitao/arrow2/blob/main/Cargo.toml#L104) * support WASM * be maintainable via macros to reduce code duplication, [user guide](https://jorgecarleitao.github.io/arrow2/), [examples](https://github.com/jorgecarleitao/arrow2/tree/main/examples) and general avoidance of `unsafe` atm it is the fastest implementation of Apache Parquet IO and Apache Avro IO that I can find (both read and write), both supporting `sync` and `async` executions and implemented in `safe` Rust (all IO in the crate is `unsafe`-free). The crate is under active development, both in volume (~800 commits in a year), and also exploring different ideas, such as * [switched to `std::Vec`](https://github.com/jorgecarleitao/arrow2/pull/693) * [Removed `dict_id` and `dict_is_ordered` from `Field`](https://github.com/jorgecarleitao/arrow2/pull/713) * [Replaced `RecordBatch` by `Chunk`](https://github.com/jorgecarleitao/arrow2/pull/717) * [investigate switching to portable simd](https://github.com/jorgecarleitao/arrow2/issues/580) * [investigate switching to a safer implementation of flatbuffers](https://github.com/jorgecarleitao/arrow2/issues/725) * [investigate enable copy on write semantics](https://github.com/jorgecarleitao/arrow2/issues/741) * [investigate mutable arrays in compute](https://github.com/jorgecarleitao/arrow2/issues/627) (which is a major reason why it is 0.X, to allow space to try things out) The crate has been adopted by [Polars](https://github.com/pola-rs/polars), [databend](https://github.com/datafuselabs/databend), [grafana's SDK for Rust](https://crates.io/crates/grafana-plugin-sdk) and is interoperable with [connectorx](https://github.com/sfu-db/connector-x). Releases have been happening about once a month (breaking), and on demand for bug fixes. The next is planned for end of this week. I hope this offers a general idea of what is the crate and where it is heading. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org