For what it is worth, DataFusion has a DataFrame interface[1], that uses the same underlying `LogicalPlan` structures as the SQL interface. Unsurprisingly it is heavily inspired by pandas.
I believe that this interface seems more familiar and popular for DataFusion users who programmatically build plans (e.g. to implement a custom query language), even though we offer a `LogicalPlanBuilder` [2] as well. So I think there is value in a DataFrame API (that wraps the C++ engine, for example). But I am not sure DataFrames are at the same level as the "Arrow Array" interface Andrew [1] https://github.com/apache/arrow-datafusion/blob/6a69f529edb3087aeba57c9f01031a98ad06dd5d/datafusion/core/src/dataframe.rs [2] https://github.com/apache/arrow-datafusion/blob/6a69f529edb3087aeba57c9f01031a98ad06dd5d/datafusion/core/src/logical_plan/builder.rs#L58-L95 On Thu, May 12, 2022 at 1:14 PM Wes McKinney <wesmck...@gmail.com> wrote: > > Discussion about whether the community around Arrow would like to have > DataFrame-like APIs for Arrow in more languages, for example C++ > > We've discussed this a bit on the mailing list in the past, see > > > https://docs.google.com/document/d/1XHe_j87n2VHGzEbnLe786GHbbcbrzbjgG8D0IXWAeHg/edit#heading=h.g70gstc7jq4h > > for example. It's a complicated subject because the problems that need > solving in a "data frame library" are much more than defining an API — > they involve establishing execution and mutation/copy-on-write > semantics (the latter which has been a huge topic of discussion in the > pandas community, for example). The API would be driving an internal > data management logic engine (similar to pandas's internal logic > engine — but hopefully we could make something without as many > problems) which would manipulate chunks of in-memory and out-of-core > Arrow data internally. > > I still would be interested in an Arrow-native "data frame library" > similar to the SFrame library that's part of Apple's (now defunct?) > Turi Create library [1] > > It's a can of worms but a problem not approached lightly (thinking of > that "one does not simply..." meme right now) and best done in heavy > consultation with communities that have experience supporting > production use of data frames for data science use cases for many > years. > > [1]: https://github.com/apple/turicreate > > On Wed, May 11, 2022 at 11:38 PM Ian Cook <i...@ursacomputing.com> wrote: > > > > Attendees: > > > > Joris Van den Bossche > > Ian Cook > > Nic Crane > > Raul Cumplido > > Ian Joiner > > David Li > > Rok Mihevc > > Dragoș Moldovan-Grünfeld > > Aldrin Montana > > Weston Pace > > Eduardo Ponce > > Matthew Topol > > Jacob Wujciak > > > > > > Discussion: > > > > Eduardo: Draft PR with a guide showing how to create a new Arrow C++ > > compute kernel [1] > > - Review requested > > > > Weston: Proposed changes to ExecPlan in Arrow C++ compute engine [2] > > - Feedback requested on details described in the Jira > > > > Rok: Temporal rounding kernels option in Arrow C++ compute engine [3] > > - Feedback requested about what we should name it > > - Possibilities include ceil_on_boundary, ceil_is_strictly_greater, > > strict_ceil, ceil_is_strictly_greater, is_strict_ceil, ceil_is_strict > > - Joris favors ceil_is_strictly_greater > > > > Ian C: Discussion about naming the Arrow C++ engine [4] > > - Comments welcome on the mailing list > > > > David: ADBC (Arrow Database Connectivity) proposal [5][6] > > - Feedback requested > > > > Ian C: Discussion about whether the community around Arrow would like > > to have DataFrame-like APIs for Arrow in more languages, for example > > C++ > > - For C++, maybe this would look similar to xframe [7] > > - Probably better to approach projects like these outside of Arrow > > and have them produce plans in Substrait format [8] which the Arrow > > C++ engine (and other engines) could consume and execute > > > > Arrow 8.0.0 release > > - Most post-release tasks complete > > - Please contribute to the release blog post [9] > > > > Release process > > - Please comment on the proposed RC process change [10] > > - There is a discussion about changing to a bimonthly major releases > > (instead of quarterly which is what we do now) > > - To make this work we could need nightly builds to be more stable; > > Raul and Jacob are working on this > > > > Should we publicly share a link that Arrow developers can use to join > > the Zuilp chat? > > - Zulip has instructions for how to do this [11] > > - We would need a Zulip admin to change the permissions to enable > > this (Wes, Antonie, Weston, at al are admins) > > - What about the ASF Slack [12] ? Should we share the details about > that? > > - The Slack has a rarely used Arrow channel and a Rust Arrow > > channel which is more popular > > - There were some doubts about whether committer permissions or the > > associated apache.org email address are required to join, but in fact > > anyone can join this Slack > > - Ian will follow up about this > > > > The Data Thread [13] > > - Voltron Data is hosting an Arrow-focused virtual conference on June 23 > > - Registration and speaker applications are open > > > > [1] https://github.com/apache/arrow/pull/10296 > > [2] https://issues.apache.org/jira/browse/ARROW-16522 > > [3] > https://github.com/apache/arrow/pull/12657/files#diff-6bc7ecec6a4f7bcefc2511cde3bd809340ad0d94bb8f7cc5f4994063c798f2faR124-R132 > > [4] https://lists.apache.org/thread/02sdm4jmqg2z98kr1mg2yq13q858xbx6 > > [5] https://lists.apache.org/thread/gnz1kz2rj3rb8rh8qz7l0mv8lvzq254w > > [6] > https://docs.google.com/document/d/1t7NrC76SyxL_OffATmjzZs2xcj1owdUsIF2WKL_Zw1U/ > > [7] https://xframe.readthedocs.io/en/latest/index.html > > [8] https://substrait.io > > [9] https://github.com/apache/arrow-site/pull/207 > > [10] https://lists.apache.org/thread/g6mqpyq2hc11xbgrq2pf653njzy53plt > > [11] https://zulip.com/help/invite-new-users#create-an-invitation-link > > [12] https://the-asf.slack.com/ > > [13] https://thedatathread.com > > > > > > On Wed, May 11, 2022 at 9:23 AM Ian Cook <i...@ursacomputing.com> wrote: > > > > > > Hi all, > > > > > > Our biweekly sync call is today at 12:00 noon Eastern time. > > > > > > The Zoom meeting URL for this and other biweekly Arrow sync calls is: > > > https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09 > > > > > > Alternatively, enter this information into the Zoom website or app to > > > join the call: > > > Meeting ID: 876 4903 3008 > > > Passcode: 958092 > > > > > > Thanks, > > > Ian >