> Discussion about whether the community around Arrow would like to have 
> DataFrame-like APIs for Arrow in more languages, for example C++

We've discussed this a bit on the mailing list in the past, see

https://docs.google.com/document/d/1XHe_j87n2VHGzEbnLe786GHbbcbrzbjgG8D0IXWAeHg/edit#heading=h.g70gstc7jq4h

for example. It's a complicated subject because the problems that need
solving in a "data frame library" are much more than defining an API —
they involve establishing execution and mutation/copy-on-write
semantics (the latter which has been a huge topic of discussion in the
pandas community, for example). The API would be driving an internal
data management logic engine (similar to pandas's internal logic
engine — but hopefully we could make something without as many
problems) which would manipulate chunks of in-memory and out-of-core
Arrow data internally.

I still would be interested in an Arrow-native "data frame library"
similar to the SFrame library that's part of Apple's (now defunct?)
Turi Create library [1]

It's a can of worms but a problem not approached lightly (thinking of
that "one does not simply..." meme right now) and best done in heavy
consultation with communities that have experience supporting
production use of data frames for data science use cases for many
years.

[1]: https://github.com/apple/turicreate

On Wed, May 11, 2022 at 11:38 PM Ian Cook <i...@ursacomputing.com> wrote:
>
> Attendees:
>
> Joris Van den Bossche
> Ian Cook
> Nic Crane
> Raul Cumplido
> Ian Joiner
> David Li
> Rok Mihevc
> Dragoș Moldovan-Grünfeld
> Aldrin Montana
> Weston Pace
> Eduardo Ponce
> Matthew Topol
> Jacob Wujciak
>
>
> Discussion:
>
> Eduardo: Draft PR with a guide showing how to create a new Arrow C++
> compute kernel [1]
>  - Review requested
>
> Weston: Proposed changes to ExecPlan in Arrow C++ compute engine [2]
>  - Feedback requested on details described in the Jira
>
> Rok: Temporal rounding kernels option in Arrow C++ compute engine [3]
>  - Feedback requested about what we should name it
>  - Possibilities include ceil_on_boundary, ceil_is_strictly_greater,
> strict_ceil, ceil_is_strictly_greater, is_strict_ceil, ceil_is_strict
>  - Joris favors ceil_is_strictly_greater
>
> Ian C: Discussion about naming the Arrow C++ engine [4]
>  - Comments welcome on the mailing list
>
> David: ADBC (Arrow Database Connectivity) proposal [5][6]
>  - Feedback requested
>
> Ian C: Discussion about whether the community around Arrow would like
> to have DataFrame-like APIs for Arrow in more languages, for example
> C++
>  - For C++, maybe this would look similar to xframe [7]
>  - Probably better to approach projects like these outside of Arrow
> and have them produce plans in Substrait format [8] which the Arrow
> C++ engine (and other engines) could consume and execute
>
> Arrow 8.0.0 release
>  - Most post-release tasks complete
>  - Please contribute to the release blog post [9]
>
> Release process
>  - Please comment on the proposed RC process change [10]
>  - There is a discussion about changing to a bimonthly major releases
> (instead of quarterly which is what we do now)
>  - To make this work we could need nightly builds to be more stable;
> Raul and Jacob are working on this
>
> Should we publicly share a link that Arrow developers can use to join
> the Zuilp chat?
>  - Zulip has instructions for how to do this  [11]
>  - We would need a Zulip admin to change the permissions to enable
> this (Wes, Antonie, Weston, at al are admins)
>  - What about the ASF Slack [12] ? Should we share the details about that?
>    - The Slack has a rarely used Arrow channel and a Rust Arrow
> channel which is more popular
>    - There were some doubts about whether committer permissions or the
> associated apache.org email address are required to join, but in fact
> anyone can join this Slack
>  - Ian will follow up about this
>
> The Data Thread [13]
>  - Voltron Data is hosting an Arrow-focused virtual conference on June 23
>  - Registration and speaker applications are open
>
> [1] https://github.com/apache/arrow/pull/10296
> [2] https://issues.apache.org/jira/browse/ARROW-16522
> [3] 
> https://github.com/apache/arrow/pull/12657/files#diff-6bc7ecec6a4f7bcefc2511cde3bd809340ad0d94bb8f7cc5f4994063c798f2faR124-R132
> [4] https://lists.apache.org/thread/02sdm4jmqg2z98kr1mg2yq13q858xbx6
> [5] https://lists.apache.org/thread/gnz1kz2rj3rb8rh8qz7l0mv8lvzq254w
> [6] 
> https://docs.google.com/document/d/1t7NrC76SyxL_OffATmjzZs2xcj1owdUsIF2WKL_Zw1U/
> [7] https://xframe.readthedocs.io/en/latest/index.html
> [8] https://substrait.io
> [9] https://github.com/apache/arrow-site/pull/207
> [10] https://lists.apache.org/thread/g6mqpyq2hc11xbgrq2pf653njzy53plt
> [11] https://zulip.com/help/invite-new-users#create-an-invitation-link
> [12] https://the-asf.slack.com/
> [13] https://thedatathread.com
>
>
> On Wed, May 11, 2022 at 9:23 AM Ian Cook <i...@ursacomputing.com> wrote:
> >
> > Hi all,
> >
> > Our biweekly sync call is today at 12:00 noon Eastern time.
> >
> > The Zoom meeting URL for this and other biweekly Arrow sync calls is:
> > https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09
> >
> > Alternatively, enter this information into the Zoom website or app to
> > join the call:
> > Meeting ID: 876 4903 3008
> > Passcode: 958092
> >
> > Thanks,
> > Ian

Reply via email to