[
https://issues.apache.org/jira/browse/ARROW-8813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569154#comment-17569154
]
Nigel McKernan edited comment on ARROW-8813 at 7/20/22 7:42 PM:
----------------------------------------------------------------
The issue [~domiden] references was committed into {{tidyr}} 1.1.0 back in May
of 2020, as you can see
[here|https://github.com/tidyverse/tidyr/releases#:~:text=pivot_longer()%20and%20pivot_wider()%20are%20now%20generic%20so%20implementations%0Acan%20be%20provided%20for%20objects%20other%20than%20data%20frames],
more than 2 years ago.
Would it be possible now to incorporate some {{tidyr}} methods that have been
converted to generics into {{{}arrow{}}}?
EDIT: As well, the {{nest()}} generic is now
[lazily-evaluated|https://github.com/tidyverse/tidyr/releases#:~:text=The%20nest()%20generic%20now%20avoids%20computing%20on%20.data%2C%20making%20it%20more%0Acompatible%20with%20lazy%20tibbles],
making it easier to do remote operations.
was (Author: JIRAUSER293150):
The issue [~domiden] references was committed into {{tidyr}} 1.1.0 back in May
of 2020, as you can see
[here|https://github.com/tidyverse/tidyr/releases#:~:text=pivot_longer()%20and%20pivot_wider()%20are%20now%20generic%20so%20implementations%0Acan%20be%20provided%20for%20objects%20other%20than%20data%20frames],
more than 2 years ago.
Would it be possible now to incorporate some {{tidyr}} methods that have been
converted to generics into {{{}arrow{}}}?
> [R] Implementing tidyr interface
> --------------------------------
>
> Key: ARROW-8813
> URL: https://issues.apache.org/jira/browse/ARROW-8813
> Project: Apache Arrow
> Issue Type: Improvement
> Components: R
> Reporter: Dominic Dennenmoser
> Priority: Major
> Labels: extension, feature, improvement
>
> I think it would be reasonable to implement an interface to the {{tidyr}}
> package. The implementation would allow to lazily process ArrowTables before
> put it back into the memory. However, currently you need to collect the table
> first before applying tidyr methods. The following code chunk shows an
> example routine:
> {code:r}
> library(magrittr)
> arrow_table <- arrow::read_feather("table.feather", as_data_frame = FALSE)
> nested_df <-
> arrow_table %>%
> dplyr::select(ID, 4:7, Value) %>%
> dplyr::filter(Value >= 5) %>%
> dplyr::group_by(ID) %>%
> dplyr::collect() %>%
> tidyr::nest(){code}
> The main focus might be the following three methods:
> * {{tidyr::[un]nest()}},
> * {{tidyr::pivot_[longer|wider]()}}, and
> * {{tidyr::seperate()}}.
> I suppose the last two can be fairly quickly implemented, but
> {{tidyr::nest()}} and {{tidyr::unnest()}} cannot be implement before
> conversion to List<Struct> will be accessible.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)