Mauricio 'Pachá' Vargas Sepúlveda created ARROW-13169:
---------------------------------------------------------

             Summary: [R] group_by + write_dataset skips some countries with UN 
COMTRADE / BACI datasets
                 Key: ARROW-13169
                 URL: https://issues.apache.org/jira/browse/ARROW-13169
             Project: Apache Arrow
          Issue Type: Bug
          Components: R
    Affects Versions: 4.0.1
            Reporter: Mauricio 'Pachá' Vargas Sepúlveda
             Fix For: 5.0.0


``` r
library(arrow)
#> 
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#> 
#>     timestamp
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

url <- "https://ams3.digitaloceanspaces.com/uncomtrade/baci_hs92_1995.rds";
rds <- "baci_hs92_1995.rds"

if (!file.exists(rds)) try(download.file(url, rds))

d <- readRDS("baci_hs92_1995.rds")

rds_has_usa <- any(grepl("usa", unique(d$reporter_iso)))
rds_has_usa
#> [1] TRUE

dir <- "parquet/baci_hs92"

d %>% 
  group_by(year, reporter_iso) %>% 
  write_dataset(dir, hive_style = F)

parquet_has_usa <- any(grepl("usa", list.files(paste0(dir, "/1995"))))
parquet_has_usa
#> [1] FALSE
```

<sup>Created on 2021-06-24 by the [reprex 
package](https://reprex.tidyverse.org) (v2.0.0)</sup>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to