[ https://issues.apache.org/jira/browse/ARROW-8216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Neal Richardson resolved ARROW-8216. ------------------------------------ Resolution: Fixed Issue resolved by pull request 6732 [https://github.com/apache/arrow/pull/6732] > [R][C++][Dataset] Filtering returns all-missing rows where the filtering > column is missing > ------------------------------------------------------------------------------------------ > > Key: ARROW-8216 > URL: https://issues.apache.org/jira/browse/ARROW-8216 > Project: Apache Arrow > Issue Type: Bug > Components: R > Affects Versions: 0.16.0 > Environment: R 3.6.3, Windows 10 > Reporter: Sam Albers > Assignee: Ben Kietzman > Priority: Minor > Labels: pull-request-available > Fix For: 0.17.0 > > Time Spent: 3.5h > Remaining Estimate: 0h > > > I have just noticed some slightly odd behaviour with the filter method for > Dataset. > > {code:java} > library(arrow) > library(dplyr) > packageVersion("arrow") > #> [1] '0.16.0.20200323' > ## Make sample parquet > starwars$hair_color[starwars$hair_color == "brown"] <- "" > dir <- tempdir() > fpath <- file.path(dir, "data.parquet") > write_parquet(starwars, fpath) > ## df in memory > df_mem <- starwars %>% > filter(hair_color == "") > ## reading from the parquet > df_parquet <- read_parquet(fpath) %>% > filter(hair_color == "") > ## using open_dataset > df_dataset <- open_dataset(dir) %>% > filter(hair_color == "") %>% > collect() > identical(df_mem, df_parquet) > #> [1] TRUE > identical(df_mem, df_dataset) > #> [1] FALSE > {code} > > > I'm pretty sure all these should return the same data.frame. Am I missing > something? > -- This message was sent by Atlassian Jira (v8.3.4#803005)