alamb opened a new issue, #9423: URL: https://github.com/apache/arrow-rs/issues/9423
> > In any case, at least at my company we probably have a few PiB of data written with this or an even earlier version. > > BTW this is so cool to hear Really want to make sure I don't spam this PR with too much sideinfo, but I just wanted to use this opportunity to share that I (+ a coworker) will give the talk "Scaling Data Processing for Training Workloads at DeepL Research with Rust" at this year's PyCon DE / PyData in Darmstadt (Germany), where we go a bit into detail about this! Working with `arrow-rs` (+ `PyO3` as the Python binding layer) has been an absolute blast so far for coming up with a highly optimized and efficient deep learning data ingress pipeline. Especially compared to `pyarrow`, we've rarely or never seen any issues concerning surprisingly high resource usages, memory leaks or randomly not supported features (I'm somewhat sure selectively decoding specific rows by row index to reduce memory usage during sparse decoding isn't possible in a non-clunky way with `pyarrow`, and with `arrow-rs`'s `RowSelection` this was trivially easy, even as a feature exposed to Python). Happy to stay connected on this topic. _Originally posted by @jonded94 in https://github.com/apache/arrow-rs/issues/9374#issuecomment-3909160504_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
