[
https://issues.apache.org/jira/browse/ARROW-8813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108422#comment-17108422
]
Dominic Dennenmoser edited comment on ARROW-8813 at 5/15/20, 4:04 PM:
----------------------------------------------------------------------
Thanks for revering to that. I've just looked for issues or pull-requests
mention anything in that direction. Fortunately, a generic version of
{{pivot_[longer|wider]()}} will be available in the upcoming version of
{{tidyr}}, and is already implemented into the development version
([#800|https://github.com/tidyverse/tidyr/issues/800]).
was (Author: domiden):
Thanks for revering to that. I've just looked for issues or pull-requests
mention anything in that direction. Fortunately, a generic version of
{{pivot_[longer|wider]()}} will be available in the upcoming version of
{{tidyr}}, and is already implemented into the development version (#800).
> [R] Implementing tidyr interface
> --------------------------------
>
> Key: ARROW-8813
> URL: https://issues.apache.org/jira/browse/ARROW-8813
> Project: Apache Arrow
> Issue Type: Improvement
> Components: R
> Reporter: Dominic Dennenmoser
> Priority: Major
> Labels: extension, feature, improvement
>
> I think it would be reasonable to implement an interface to the {{tidyr}}
> package. The implementation would allow to lazily process ArrowTables before
> put it back into the memory. However, currently you need to collect the table
> first before applying tidyr methods. The following code chunk shows an
> example routine:
> {code:r}
> library(magrittr)
> arrow_table <- arrow::read_feather("table.feather", as_data_frame = FALSE)
> nested_df <-
> arrow_table %>%
> dplyr::select(ID, 4:7, Value) %>%
> dplyr::filter(Value >= 5) %>%
> dplyr::group_by(ID) %>%
> dplyr::collect() %>%
> tidyr::nest(){code}
> The main focus might be the following three methods:
> * {{tidyr::[un]nest()}},
> * {{tidyr::pivot_[longer|wider]()}}, and
> * {{tidyr::seperate()}}.
> I suppose the last two can be fairly quickly implemented, but
> {{tidyr::nest()}} and {{tidyr::unnest()}} cannot be implement before
> conversion to List<Struct> will be accessible.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)