Also, it seems as if duckdb[1] is heading in the same direction of adding a dataframe API to their database engine
[1] https://github.com/duckdb/duckdb/issues/2000 On Thu, May 12, 2022 at 3:36 PM Andrew Lamb <al...@influxdata.com> wrote: > For what it is worth, DataFusion has a DataFrame interface[1], that uses > the same underlying `LogicalPlan` structures as the SQL interface. > Unsurprisingly it is heavily inspired by pandas. > > I believe that this interface seems more familiar and popular for > DataFusion users who programmatically build plans (e.g. to implement a > custom query language), even though we offer a `LogicalPlanBuilder` [2] as > well. > > So I think there is value in a DataFrame API (that wraps the C++ engine, > for example). But I am not sure DataFrames are at the same level as the > "Arrow Array" interface > > Andrew > > > [1] > https://github.com/apache/arrow-datafusion/blob/6a69f529edb3087aeba57c9f01031a98ad06dd5d/datafusion/core/src/dataframe.rs > [2] > https://github.com/apache/arrow-datafusion/blob/6a69f529edb3087aeba57c9f01031a98ad06dd5d/datafusion/core/src/logical_plan/builder.rs#L58-L95 > > On Thu, May 12, 2022 at 1:14 PM Wes McKinney <wesmck...@gmail.com> wrote: > >> > Discussion about whether the community around Arrow would like to have >> DataFrame-like APIs for Arrow in more languages, for example C++ >> >> We've discussed this a bit on the mailing list in the past, see >> >> >> https://docs.google.com/document/d/1XHe_j87n2VHGzEbnLe786GHbbcbrzbjgG8D0IXWAeHg/edit#heading=h.g70gstc7jq4h >> >> for example. It's a complicated subject because the problems that need >> solving in a "data frame library" are much more than defining an API — >> they involve establishing execution and mutation/copy-on-write >> semantics (the latter which has been a huge topic of discussion in the >> pandas community, for example). The API would be driving an internal >> data management logic engine (similar to pandas's internal logic >> engine — but hopefully we could make something without as many >> problems) which would manipulate chunks of in-memory and out-of-core >> Arrow data internally. >> >> I still would be interested in an Arrow-native "data frame library" >> similar to the SFrame library that's part of Apple's (now defunct?) >> Turi Create library [1] >> >> It's a can of worms but a problem not approached lightly (thinking of >> that "one does not simply..." meme right now) and best done in heavy >> consultation with communities that have experience supporting >> production use of data frames for data science use cases for many >> years. >> >> [1]: https://github.com/apple/turicreate >> >> On Wed, May 11, 2022 at 11:38 PM Ian Cook <i...@ursacomputing.com> wrote: >> > >> > Attendees: >> > >> > Joris Van den Bossche >> > Ian Cook >> > Nic Crane >> > Raul Cumplido >> > Ian Joiner >> > David Li >> > Rok Mihevc >> > Dragoș Moldovan-Grünfeld >> > Aldrin Montana >> > Weston Pace >> > Eduardo Ponce >> > Matthew Topol >> > Jacob Wujciak >> > >> > >> > Discussion: >> > >> > Eduardo: Draft PR with a guide showing how to create a new Arrow C++ >> > compute kernel [1] >> > - Review requested >> > >> > Weston: Proposed changes to ExecPlan in Arrow C++ compute engine [2] >> > - Feedback requested on details described in the Jira >> > >> > Rok: Temporal rounding kernels option in Arrow C++ compute engine [3] >> > - Feedback requested about what we should name it >> > - Possibilities include ceil_on_boundary, ceil_is_strictly_greater, >> > strict_ceil, ceil_is_strictly_greater, is_strict_ceil, ceil_is_strict >> > - Joris favors ceil_is_strictly_greater >> > >> > Ian C: Discussion about naming the Arrow C++ engine [4] >> > - Comments welcome on the mailing list >> > >> > David: ADBC (Arrow Database Connectivity) proposal [5][6] >> > - Feedback requested >> > >> > Ian C: Discussion about whether the community around Arrow would like >> > to have DataFrame-like APIs for Arrow in more languages, for example >> > C++ >> > - For C++, maybe this would look similar to xframe [7] >> > - Probably better to approach projects like these outside of Arrow >> > and have them produce plans in Substrait format [8] which the Arrow >> > C++ engine (and other engines) could consume and execute >> > >> > Arrow 8.0.0 release >> > - Most post-release tasks complete >> > - Please contribute to the release blog post [9] >> > >> > Release process >> > - Please comment on the proposed RC process change [10] >> > - There is a discussion about changing to a bimonthly major releases >> > (instead of quarterly which is what we do now) >> > - To make this work we could need nightly builds to be more stable; >> > Raul and Jacob are working on this >> > >> > Should we publicly share a link that Arrow developers can use to join >> > the Zuilp chat? >> > - Zulip has instructions for how to do this [11] >> > - We would need a Zulip admin to change the permissions to enable >> > this (Wes, Antonie, Weston, at al are admins) >> > - What about the ASF Slack [12] ? Should we share the details about >> that? >> > - The Slack has a rarely used Arrow channel and a Rust Arrow >> > channel which is more popular >> > - There were some doubts about whether committer permissions or the >> > associated apache.org email address are required to join, but in fact >> > anyone can join this Slack >> > - Ian will follow up about this >> > >> > The Data Thread [13] >> > - Voltron Data is hosting an Arrow-focused virtual conference on June >> 23 >> > - Registration and speaker applications are open >> > >> > [1] https://github.com/apache/arrow/pull/10296 >> > [2] https://issues.apache.org/jira/browse/ARROW-16522 >> > [3] >> https://github.com/apache/arrow/pull/12657/files#diff-6bc7ecec6a4f7bcefc2511cde3bd809340ad0d94bb8f7cc5f4994063c798f2faR124-R132 >> > [4] https://lists.apache.org/thread/02sdm4jmqg2z98kr1mg2yq13q858xbx6 >> > [5] https://lists.apache.org/thread/gnz1kz2rj3rb8rh8qz7l0mv8lvzq254w >> > [6] >> https://docs.google.com/document/d/1t7NrC76SyxL_OffATmjzZs2xcj1owdUsIF2WKL_Zw1U/ >> > [7] https://xframe.readthedocs.io/en/latest/index.html >> > [8] https://substrait.io >> > [9] https://github.com/apache/arrow-site/pull/207 >> > [10] https://lists.apache.org/thread/g6mqpyq2hc11xbgrq2pf653njzy53plt >> > [11] https://zulip.com/help/invite-new-users#create-an-invitation-link >> > [12] https://the-asf.slack.com/ >> > [13] https://thedatathread.com >> > >> > >> > On Wed, May 11, 2022 at 9:23 AM Ian Cook <i...@ursacomputing.com> wrote: >> > > >> > > Hi all, >> > > >> > > Our biweekly sync call is today at 12:00 noon Eastern time. >> > > >> > > The Zoom meeting URL for this and other biweekly Arrow sync calls is: >> > > https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09 >> > > >> > > Alternatively, enter this information into the Zoom website or app to >> > > join the call: >> > > Meeting ID: 876 4903 3008 >> > > Passcode: 958092 >> > > >> > > Thanks, >> > > Ian >> >