[
https://issues.apache.org/jira/browse/ARROW-9742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17178562#comment-17178562
]
Jörn Horstmann commented on ARROW-9742:
---------------------------------------
Hi [~andygrove] , in the call I was more thinking about the compute kernels
which would be used by different dataframe implementations. I know of the
following two projects, both seem to be implementing common arithmetic
operations themselves instead of reusing functions defined inside arrow
Polars:
[https://github.com/ritchie46/polars/blob/master/polars/src/chunked_array/arithmetic.rs]
Rust Dataframe:
[https://github.com/nevi-me/rust-dataframe/blob/master/src/functions/scalar.rs]
I don't think this was done for performance reasons, as the current packed_simd
implementations should be quite fast. Maybe it's more of a marketing problem
and people do not know that the rust arrow implementation contains those
kernels and not just the array data structures.
Nevertheless, I think having a common DataFrame implementation inside arrow
makes sense, especially since the implementation can reuse all of the existing
datafusion and logicalplan machinery.
> [Rust] Create one standard DataFrame API
> ----------------------------------------
>
> Key: ARROW-9742
> URL: https://issues.apache.org/jira/browse/ARROW-9742
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Rust
> Reporter: Andy Grove
> Priority: Major
> Labels: pull-request-available
> Fix For: 2.0.0
>
> Time Spent: 2h 10m
> Remaining Estimate: 0h
>
> There was a discussion in last Arrow sync call about the fact that there are
> numerous Rust DataFrame projects and it would be good to have one standard,
> in the Arrow repo.
> I do think it would be good to have a DataFrame trait in Arrow, with an
> implementation in DataFusion, and making it possible for other projects to
> extend/replace the implementation e.g. for distributed compute, or for GPU
> compute, as two examples.
> [~jhorstmann] Does this capture what you were suggesting in the call?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)