Ian Cook created ARROW-14649:
--------------------------------
Summary: [R] Include unused factor levels in coalesce() output
Key: ARROW-14649
URL: https://issues.apache.org/jira/browse/ARROW-14649
Project: Apache Arrow
Issue Type: Improvement
Components: R
Reporter: Ian Cook
ARROW-14167 added support for factors in {{{}coalesce(){}}}, but the factors
that are returned will not necessarily retain the factor levels like
{{coalesce()}} does when used on an R data frame.
For example, compare these, noticing the difference in the levels:
{code:r}
# R data frame
tibble(x = factor(c("a", NA_character_)), y = factor(c("b", "c"))) %>%
mutate(y = coalesce(x, y)) %>%
pull(y)
#> [1] a c
#> Levels: a b c{code}
{code:r}
# Arrow Table
tibble(x = factor(c("a", NA_character_)), y = factor(c("b", "c"))) %>%
Table$create() %>%
mutate(y = coalesce(x, y)) %>%
pull(y)
#> [1] a c
#> Levels: a c{code}
I'm not sure if it is practical to make Arrow return the factors with the
unused levels included like R does. If so, we should do it.
See the test in {{test-dplyr-funcs-conditional.R}} that refers to this Jira.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)