[ 
https://issues.apache.org/jira/browse/ARROW-12293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332404#comment-17332404
 ] 

Andrew Lamb commented on ARROW-12293:
-------------------------------------

Migrated to github: https://github.com/apache/arrow-datafusion/issues/197

> [Rust][DataFusion] Word Count
> -----------------------------
>
>                 Key: ARROW-12293
>                 URL: https://issues.apache.org/jira/browse/ARROW-12293
>             Project: Apache Arrow
>          Issue Type: Wish
>          Components: Rust - DataFusion
>            Reporter: Jacob Baumbach
>            Priority: Trivial
>              Labels: newbie, question
>
> I am learning DataFusion and tried to do the canonical big data version of 
> hello world, word count, using DataFusion.  I have been unsuccessful, and I 
> am wondering if word count is even currently possible with DataFusion.
>  
> Typically word count involves a flat_map where you split each string based on 
> the white space contained within each string.  
>  
> There are two issues I am running into
> 1) creating a udf that goes from &str -> Vec<&str>.  I cannot find an 
> `arrow::array` that maps to a collection of string, which is preventing me 
> from creating a udf that can perform the split.
> 2) Assuming I could get `1` to work, I am not aware of a method that is 
> similar to flat_map that may be performed on a column.  In sql, I believe 
> this is called `explode`, which I can't find in the codebase, which makes me 
> think flat_map style operations aren't possible.
>  
> My questions are:
> Is word count currently possible in DataFusion?  If so, how can perform the 
> split and how can you perform a flat_map?  If word count cannot be done, what 
> would need to be implemented to make it possible?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to