Vladimir created ARROW-16641:
--------------------------------
Summary: [R] How to filter array columns?
Key: ARROW-16641
URL: https://issues.apache.org/jira/browse/ARROW-16641
Project: Apache Arrow
Issue Type: Wish
Components: R
Reporter: Vladimir
Fix For: 8.0.0
In the parquet data we have, there is a column with the array data type
({*}list<array_element <string>>{*}), which flags records that have different
issues. For each record, multiple values could be stored in the column. For
example, `{_}[A, B, C]{_}`.
I'm trying to perform a data filtering step and exclude some flagged records.
Filtering is trivial for the regular columns that contain just a single value.
E.g.,
{{flags_to_exclude <- c("A", "B")}}
{{datt %>% filter(! col %in% flags_to_exclude)}}
Given the array column, is it possible to exclude records with at least one of
the flags from `flags_to_exclude` using the arrow R package?
I really appreciate any advice you can provide!
--
This message was sent by Atlassian Jira
(v8.20.7#820007)