[ 
https://issues.apache.org/jira/browse/ARROW-8813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108422#comment-17108422
 ] 

Dominic Dennenmoser edited comment on ARROW-8813 at 5/15/20, 4:04 PM:
----------------------------------------------------------------------

Thanks for revering to that. I've just looked for issues or pull-requests 
mention anything in that direction. Fortunately, a generic version of 
{{pivot_[longer|wider]()}} will be available in the upcoming version of 
{{tidyr}}, and is already implemented into the development version 
([#800|https://github.com/tidyverse/tidyr/issues/800]).


was (Author: domiden):
Thanks for revering to that. I've just looked for issues or pull-requests 
mention anything in that direction. Fortunately, a generic version of 
{{pivot_[longer|wider]()}} will be available in the upcoming version of 
{{tidyr}}, and is already implemented into the development version (#800).

> [R] Implementing tidyr interface
> --------------------------------
>
>                 Key: ARROW-8813
>                 URL: https://issues.apache.org/jira/browse/ARROW-8813
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Dominic Dennenmoser
>            Priority: Major
>              Labels: extension, feature, improvement
>
> I think it would be reasonable to implement an interface to the {{tidyr}} 
> package. The implementation would allow to lazily process ArrowTables before 
> put it back into the memory. However, currently you need to collect the table 
> first before applying tidyr methods. The following code chunk shows an 
> example routine:
> {code:r}
> library(magrittr)
> arrow_table <- arrow::read_feather("table.feather", as_data_frame = FALSE) 
> nested_df <-
>    arrow_table %>%
>    dplyr::select(ID, 4:7, Value) %>%
>    dplyr::filter(Value >= 5) %>%
>    dplyr::group_by(ID) %>%
>    dplyr::collect() %>%
>    tidyr::nest(){code}
> The main focus might be the following three methods:
>  * {{tidyr::[un]nest()}},
>  * {{tidyr::pivot_[longer|wider]()}}, and
>  * {{tidyr::seperate()}}.
> I suppose the last two can be fairly quickly implemented, but 
> {{tidyr::nest()}} and {{tidyr::unnest()}} cannot be implement before 
> conversion to List<Struct> will be accessible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to