I'm happy to provide a quarterly update on C++ engine work but in the
future I'll draft it in PR form so others have a chance to pitch in.
I was inspired by, and hope to mimic, the Rust community's very cool
quarterly roadmap [1][2] as a place to have higher level discussions
on what people are hoping to work on.  Since the C++ implementation
has quarterly releases we can probably sync up with releases so I'll
start a discussion about halfway to the 8.0.0 release.

[1] 
https://docs.google.com/document/d/1t64vZwZnXm9MyFj2qz3xcAkSxK3Wu12giS3KrS4nDE0/edit
[2] https://github.com/apache/arrow-datafusion/pull/2133

On Mon, Apr 18, 2022 at 7:20 AM Will Jones <will.jones...@gmail.com> wrote:
>
> Thanks Weston for providing the update on the C++ compute engine. IMO, it
> would be very welcome to have that update be a quarterly email to the dev
> mailing list, and may provide an opportunity to highlight issues in Jira
> that are good first issues or neglected but important.
>
> On Wed, Apr 13, 2022 at 10:00 AM David Li <lidav...@apache.org> wrote:
>
> > Attendees:
> >
> > - David Li
> > - Eduardo Ponce
> > - Gavin Ray
> > - Ian Cook
> > - James Duong
> > - Matthew Topol
> > - Nic
> > - Niranda
> > - Raul Cumplido
> > - Rok
> > - Weston Pace
> > - Will Jones
> >
> > N.B. The Voltron Data folks have a scheduling conflict on 4/27 and will
> > not be able to host the fortnightly sync call. Is anyone available to run
> > the meeting that day?
> >
> > Agenda:
> >
> > 8.0.0 Release: targeting 4/21, please try to get PRs wrapped up in the
> > next ~1-2 weeks. See the ML post [1] for details, including a wiki page
> > listing outstanding issues. In particular, there are some Go PRs that could
> > use attention from an interested Go developer [2], as well as some temporal
> > kernel PRs that could use a review [3].
> >
> > Arrow C++ Compute Engine: Weston gave a status update; APIs/documentation
> > has been improved for users, though likely most will use it through an API
> > like Substrait; basic Substrait support has been added with forthcoming
> > improvements; more tooling to measure performance is being worked on;
> > general kernel execution overhead is being addressed with an eye towards
> > running smaller batches through the engine. An asof join implementation is
> > being worked on, and Go is working towards Substrait bindings to be able to
> > bind to the C++ engine.
> >
> > Kernel vectorization/SIMD: Eduardo has been looking at making some of the
> > primitive kernels (e.g. arithmetic) more easily autovectorized by the
> > compiler, testing a variety of approaches. See related discussion [4]. We
> > do not have benchmarks to evaluate compiler performance in this regard
> > generally, but we have manually inspected some compiler output and found
> > that not all compilers manage to do this with the current kernel
> > implementations. We also don't have a holistic way to evaluate this going
> > forward, nor do we have a sense for current benchmark coverage, though
> > possibly we could generate benchmarks. However, it was pointed out that
> > general engine performance is likely more important, and that current
> > profiling indicates kernels are not yet a bottleneck, though there may be
> > low-hanging fruit here.
> >
> > Flight/Flight SQL: we discussed the barriers to Flight SQL support in Go;
> > Flight SQL heavily uses union types which are not yet implemented. A
> > further proposal [5] has been submitted to extend the type metadata, please
> > take a look for those interested. The GetXdbcTypeInfo proposal was merged,
> > and the inline data proposal is still outstanding (but probably ready to
> > have a vote).
> >
> > IPC/Format: it was asked if there's an IPC structure for serializing a
> > single array to reduce overhead. Current APIs likely suffice but Niranda
> > may submit a separate discussion to explain further.
> >
> > [1]: https://lists.apache.org/thread/zk8hhynvy0bqvqpxk0868n5g0nmzbzbn
> > [2]: https://github.com/apache/arrow/pull/12158
> > [3]: https://github.com/apache/arrow/pull/12657
> > [4]: https://lists.apache.org/thread/8o7k4dt23chx3gn13rwkms38syyms489
> > [5]: https://lists.apache.org/thread/thvn89wg29gyctwycx2zjr4vvm2g80o6
> >
> > On Tue, Apr 12, 2022, at 16:17, Ian Cook wrote:
> > > Hi all,
> > >
> > > Our biweekly sync call is tomorrow at 12:00 noon Eastern time.
> > >
> > > The Zoom meeting URL for this and other biweekly Arrow sync calls is:
> > > https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09
> > >
> > > Alternatively, enter this information into the Zoom website or app to
> > > join the call:
> > > Meeting ID: 876 4903 3008
> > > Passcode: 958092
> > >
> > > Thanks,
> > > Ian
> >

Reply via email to