alamb opened a new issue, #9423:
URL: https://github.com/apache/arrow-rs/issues/9423

   > > In any case, at least at my company we probably have a few PiB of data 
written with this or an even earlier version.
   >
   > BTW this is so cool to hear
   
   Really want to make sure I don't spam this PR with too much sideinfo, but I 
just wanted to use this opportunity to share that I (+ a coworker) will give 
the talk "Scaling Data Processing for Training Workloads at DeepL Research with 
Rust" at this year's PyCon DE / PyData in Darmstadt (Germany), where we go a 
bit into detail about this! 
   
   Working with `arrow-rs` (+ `PyO3` as the Python binding layer) has been an 
absolute blast so far for coming up with a highly optimized and efficient deep 
learning data ingress pipeline. 
   
   Especially compared to `pyarrow`, we've rarely or never seen any issues 
concerning surprisingly high resource usages, memory leaks or randomly not 
supported features (I'm somewhat sure selectively decoding specific rows by row 
index to reduce memory usage during sparse decoding isn't possible in a 
non-clunky way with `pyarrow`, and with `arrow-rs`'s `RowSelection` this was 
trivially easy, even as a feature exposed to Python). Happy to stay connected 
on this topic.
   
   _Originally posted by @jonded94 in 
https://github.com/apache/arrow-rs/issues/9374#issuecomment-3909160504_
               


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to