On 4/25/22 2:49 PM, David Li wrote:
Following up here:
N.B. The Voltron Data folks have a scheduling conflict on 4/27 and will not be
able to host the fortnightly sync call. Is anyone available to run the meeting
that day?
Is anyone available to run the sync call this Wednesday?
On Wed, Apr 13, 2022, at 12:59, David Li wrote:
Attendees:
- David Li
- Eduardo Ponce
- Gavin Ray
- Ian Cook
- James Duong
- Matthew Topol
- Nic
- Niranda
- Raul Cumplido
- Rok
- Weston Pace
- Will Jones
N.B. The Voltron Data folks have a scheduling conflict on 4/27 and will
not be able to host the fortnightly sync call. Is anyone available to
run the meeting that day?
Agenda:
8.0.0 Release: targeting 4/21, please try to get PRs wrapped up in the
next ~1-2 weeks. See the ML post [1] for details, including a wiki page
listing outstanding issues. In particular, there are some Go PRs that
could use attention from an interested Go developer [2], as well as
some temporal kernel PRs that could use a review [3].
Arrow C++ Compute Engine: Weston gave a status update;
APIs/documentation has been improved for users, though likely most will
use it through an API like Substrait; basic Substrait support has been
added with forthcoming improvements; more tooling to measure
performance is being worked on; general kernel execution overhead is
being addressed with an eye towards running smaller batches through the
engine. An asof join implementation is being worked on, and Go is
working towards Substrait bindings to be able to bind to the C++ engine.
Kernel vectorization/SIMD: Eduardo has been looking at making some of
the primitive kernels (e.g. arithmetic) more easily autovectorized by
the compiler, testing a variety of approaches. See related discussion
[4]. We do not have benchmarks to evaluate compiler performance in this
regard generally, but we have manually inspected some compiler output
and found that not all compilers manage to do this with the current
kernel implementations. We also don't have a holistic way to evaluate
this going forward, nor do we have a sense for current benchmark
coverage, though possibly we could generate benchmarks. However, it was
pointed out that general engine performance is likely more important,
and that current profiling indicates kernels are not yet a bottleneck,
though there may be low-hanging fruit here.
Flight/Flight SQL: we discussed the barriers to Flight SQL support in
Go; Flight SQL heavily uses union types which are not yet implemented.
A further proposal [5] has been submitted to extend the type metadata,
please take a look for those interested. The GetXdbcTypeInfo proposal
was merged, and the inline data proposal is still outstanding (but
probably ready to have a vote).
IPC/Format: it was asked if there's an IPC structure for serializing a
single array to reduce overhead. Current APIs likely suffice but
Niranda may submit a separate discussion to explain further.
[1]: https://lists.apache.org/thread/zk8hhynvy0bqvqpxk0868n5g0nmzbzbn
[2]: https://github.com/apache/arrow/pull/12158
[3]: https://github.com/apache/arrow/pull/12657
[4]: https://lists.apache.org/thread/8o7k4dt23chx3gn13rwkms38syyms489
[5]: https://lists.apache.org/thread/thvn89wg29gyctwycx2zjr4vvm2g80o6
On Tue, Apr 12, 2022, at 16:17, Ian Cook wrote:
Hi all,
Our biweekly sync call is tomorrow at 12:00 noon Eastern time.
The Zoom meeting URL for this and other biweekly Arrow sync calls is:
https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09
Alternatively, enter this information into the Zoom website or app to
join the call:
Meeting ID: 876 4903 3008
Passcode: 958092
Thanks,
Ian