alamb opened a new issue #197: URL: https://github.com/apache/arrow-datafusion/issues/197
*Note*: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-12293 I am learning DataFusion and tried to do the canonical big data version of hello world, word count, using DataFusion. I have been unsuccessful, and I am wondering if word count is even currently possible with DataFusion. Typically word count involves a flat_map where you split each string based on the white space contained within each string. There are two issues I am running into 1) creating a udf that goes from &str -> Vec<&str>. I cannot find an `arrow::array` that maps to a collection of string, which is preventing me from creating a udf that can perform the split. 2) Assuming I could get `1` to work, I am not aware of a method that is similar to flat_map that may be performed on a column. In sql, I believe this is called `explode`, which I can't find in the codebase, which makes me think flat_map style operations aren't possible. My questions are: Is word count currently possible in DataFusion? If so, how can perform the split and how can you perform a flat_map? If word count cannot be done, what would need to be implemented to make it possible? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
