I'm happy to provide a quarterly update on C++ engine work but in the future I'll draft it in PR form so others have a chance to pitch in. I was inspired by, and hope to mimic, the Rust community's very cool quarterly roadmap [1][2] as a place to have higher level discussions on what people are hoping to work on. Since the C++ implementation has quarterly releases we can probably sync up with releases so I'll start a discussion about halfway to the 8.0.0 release.
[1] https://docs.google.com/document/d/1t64vZwZnXm9MyFj2qz3xcAkSxK3Wu12giS3KrS4nDE0/edit [2] https://github.com/apache/arrow-datafusion/pull/2133 On Mon, Apr 18, 2022 at 7:20 AM Will Jones <will.jones...@gmail.com> wrote: > > Thanks Weston for providing the update on the C++ compute engine. IMO, it > would be very welcome to have that update be a quarterly email to the dev > mailing list, and may provide an opportunity to highlight issues in Jira > that are good first issues or neglected but important. > > On Wed, Apr 13, 2022 at 10:00 AM David Li <lidav...@apache.org> wrote: > > > Attendees: > > > > - David Li > > - Eduardo Ponce > > - Gavin Ray > > - Ian Cook > > - James Duong > > - Matthew Topol > > - Nic > > - Niranda > > - Raul Cumplido > > - Rok > > - Weston Pace > > - Will Jones > > > > N.B. The Voltron Data folks have a scheduling conflict on 4/27 and will > > not be able to host the fortnightly sync call. Is anyone available to run > > the meeting that day? > > > > Agenda: > > > > 8.0.0 Release: targeting 4/21, please try to get PRs wrapped up in the > > next ~1-2 weeks. See the ML post [1] for details, including a wiki page > > listing outstanding issues. In particular, there are some Go PRs that could > > use attention from an interested Go developer [2], as well as some temporal > > kernel PRs that could use a review [3]. > > > > Arrow C++ Compute Engine: Weston gave a status update; APIs/documentation > > has been improved for users, though likely most will use it through an API > > like Substrait; basic Substrait support has been added with forthcoming > > improvements; more tooling to measure performance is being worked on; > > general kernel execution overhead is being addressed with an eye towards > > running smaller batches through the engine. An asof join implementation is > > being worked on, and Go is working towards Substrait bindings to be able to > > bind to the C++ engine. > > > > Kernel vectorization/SIMD: Eduardo has been looking at making some of the > > primitive kernels (e.g. arithmetic) more easily autovectorized by the > > compiler, testing a variety of approaches. See related discussion [4]. We > > do not have benchmarks to evaluate compiler performance in this regard > > generally, but we have manually inspected some compiler output and found > > that not all compilers manage to do this with the current kernel > > implementations. We also don't have a holistic way to evaluate this going > > forward, nor do we have a sense for current benchmark coverage, though > > possibly we could generate benchmarks. However, it was pointed out that > > general engine performance is likely more important, and that current > > profiling indicates kernels are not yet a bottleneck, though there may be > > low-hanging fruit here. > > > > Flight/Flight SQL: we discussed the barriers to Flight SQL support in Go; > > Flight SQL heavily uses union types which are not yet implemented. A > > further proposal [5] has been submitted to extend the type metadata, please > > take a look for those interested. The GetXdbcTypeInfo proposal was merged, > > and the inline data proposal is still outstanding (but probably ready to > > have a vote). > > > > IPC/Format: it was asked if there's an IPC structure for serializing a > > single array to reduce overhead. Current APIs likely suffice but Niranda > > may submit a separate discussion to explain further. > > > > [1]: https://lists.apache.org/thread/zk8hhynvy0bqvqpxk0868n5g0nmzbzbn > > [2]: https://github.com/apache/arrow/pull/12158 > > [3]: https://github.com/apache/arrow/pull/12657 > > [4]: https://lists.apache.org/thread/8o7k4dt23chx3gn13rwkms38syyms489 > > [5]: https://lists.apache.org/thread/thvn89wg29gyctwycx2zjr4vvm2g80o6 > > > > On Tue, Apr 12, 2022, at 16:17, Ian Cook wrote: > > > Hi all, > > > > > > Our biweekly sync call is tomorrow at 12:00 noon Eastern time. > > > > > > The Zoom meeting URL for this and other biweekly Arrow sync calls is: > > > https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09 > > > > > > Alternatively, enter this information into the Zoom website or app to > > > join the call: > > > Meeting ID: 876 4903 3008 > > > Passcode: 958092 > > > > > > Thanks, > > > Ian > >