[ 
https://issues.apache.org/jira/browse/ARROW-9742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17178562#comment-17178562
 ] 

Jörn Horstmann commented on ARROW-9742:
---------------------------------------

Hi [~andygrove] , in the call I was more thinking about the compute kernels 
which would be used by different dataframe implementations. I know of the 
following two projects, both seem to be implementing common arithmetic 
operations themselves instead of reusing functions defined inside arrow

Polars: 
[https://github.com/ritchie46/polars/blob/master/polars/src/chunked_array/arithmetic.rs]

Rust Dataframe: 
[https://github.com/nevi-me/rust-dataframe/blob/master/src/functions/scalar.rs]

I don't think this was done for performance reasons, as the current packed_simd 
implementations should be quite fast. Maybe it's more of a marketing problem 
and people do not know that the rust arrow implementation contains those 
kernels and not just the array data structures.

Nevertheless, I think having a common DataFrame implementation inside arrow 
makes sense, especially since the implementation can reuse all of the existing 
datafusion and logicalplan machinery.

> [Rust] Create one standard DataFrame API
> ----------------------------------------
>
>                 Key: ARROW-9742
>                 URL: https://issues.apache.org/jira/browse/ARROW-9742
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Rust
>            Reporter: Andy Grove
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.0.0
>
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
>  There was a discussion in last Arrow sync call about the fact that there are 
> numerous Rust DataFrame projects and it would be good to have one standard, 
> in the Arrow repo.
> I do think it would be good to have a DataFrame trait in Arrow, with an 
> implementation in DataFusion, and making it possible for other projects to 
> extend/replace the implementation e.g. for distributed compute, or for GPU 
> compute, as two examples. 
> [~jhorstmann] Does this capture what you were suggesting in the call?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to