Hi! I've been using arrow/arrow-rs for a while now, my use case is to parse Arrow streaming files and convert them into CSV.
Rust has been an absolute fantastic tool for this, the performance is outstanding and I have had no issues using it for my use case. I would be happy to test out the branch and let you know what the performance is like, as I was going to improve the current implementation that i have for the CSV writer, as it takes a while for bigger datasets (multi-GB). Josh On Thu, 27 May 2021 at 22:49, Jed Brown <j...@jedbrown.org> wrote: > Andy Grove <andygrov...@gmail.com> writes: > > > > Looking at this purely from the DataFusion/Ballista point of view, what I > > would be interested in would be having a branch of DF that uses arrow2 > and > > once that branch has all tests passing and can run queries with > performance > > that is at least as good as the original arrow crate, then cut over. > > > > However, for developers using the arrow APIs directly, I don't see an > easy > > path. We either try and gradually PR the changes in (which seems really > > hard given that there are significant changes to APIs and internal data > > structures) or we port some portion of the existing tests over to arrow2 > > and then make that the official crate once all test pass. > > How feasible would it be to make a legacy module in arrow2 that would > enable (some large subset of) existing arrow users to try arrow2 after > adjusting their use statements? (That is, implement the public-facing > legacy interfaces in terms of arrow2's new, safe interface.) This would > make it easier to test with DataFusion/Ballista and external users of the > current arrow crate, then cut over and let those packages update > incrementally from legacy to modern arrow2. > > I think it would be okay to tolerate some performance degradation when > working through these legacy interfaces,so long as there was confidence > that modernizing the callers would recover the performance (as tests have > been showing). >