tustvold commented on issue #2394: URL: https://github.com/apache/arrow-rs/issues/2394#issuecomment-1213028137
_The following is potentially somewhat subjective, so take with a grain of salt, but is I think fair_ > column reader The low-level [column](https://docs.rs/parquet/latest/parquet/column/index.html) API is still actively developed, in so much as the arrow internals make use of it. However, it is worth noting that they decode to their own buffer implementations instead of using `[DataType::T]`, as especially for byte arrays this is prohibitively expensive. This extension mechanism is not currently exposed outside the crate, as it is relatively unstable. If you use this interface you will need to perform record reassembly yourself > page reader I presume you're referring to the [file](https://docs.rs/parquet/latest/parquet/file/index.html) APIs here. If so these are still actively developed, as they are used by the arrow API without any major caveats when operating on local files. > are there any other The only high-level interface that I would describe as actively maintained is [arrow](https://docs.rs/parquet/latest/parquet/arrow/index.html), and is where most development effort is currently focused, with significant effort expended to make it fast, feature complete, and add advanced functionality such as predicate pushdown, async IO, etc... Whilst arrow may be a somewhat heavy dependency, there are ongoing improvements in this space, and I believe the additional performance, especially for dictionary encoded or variable length types, more than makes up for this. Perhaps we could add more feature flags to arrow-rs to reduce the size of it as a dependency, would it then work for your use-case? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
