alamb opened a new issue #197:
URL: https://github.com/apache/arrow-datafusion/issues/197


   *Note*: migrated from original JIRA: 
https://issues.apache.org/jira/browse/ARROW-12293
   
   I am learning DataFusion and tried to do the canonical big data version of 
hello world, word count, using DataFusion.  I have been unsuccessful, and I am 
wondering if word count is even currently possible with DataFusion.
   
    
   
   Typically word count involves a flat_map where you split each string based 
on the white space contained within each string.  
   
    
   
   There are two issues I am running into
   
   1) creating a udf that goes from &str -> Vec<&str>.  I cannot find an 
`arrow::array` that maps to a collection of string, which is preventing me from 
creating a udf that can perform the split.
   
   2) Assuming I could get `1` to work, I am not aware of a method that is 
similar to flat_map that may be performed on a column.  In sql, I believe this 
is called `explode`, which I can't find in the codebase, which makes me think 
flat_map style operations aren't possible.
   
    
   
   My questions are:
   
   Is word count currently possible in DataFusion?  If so, how can perform the 
split and how can you perform a flat_map?  If word count cannot be done, what 
would need to be implemented to make it possible?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to