Mauricio 'Pachá' Vargas Sepúlveda created ARROW-13169:
---------------------------------------------------------
Summary: [R] group_by + write_dataset skips some countries with UN
COMTRADE / BACI datasets
Key: ARROW-13169
URL: https://issues.apache.org/jira/browse/ARROW-13169
Project: Apache Arrow
Issue Type: Bug
Components: R
Affects Versions: 4.0.1
Reporter: Mauricio 'Pachá' Vargas Sepúlveda
Fix For: 5.0.0
``` r
library(arrow)
#>
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#>
#> timestamp
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
url <- "https://ams3.digitaloceanspaces.com/uncomtrade/baci_hs92_1995.rds"
rds <- "baci_hs92_1995.rds"
if (!file.exists(rds)) try(download.file(url, rds))
d <- readRDS("baci_hs92_1995.rds")
rds_has_usa <- any(grepl("usa", unique(d$reporter_iso)))
rds_has_usa
#> [1] TRUE
dir <- "parquet/baci_hs92"
d %>%
group_by(year, reporter_iso) %>%
write_dataset(dir, hive_style = F)
parquet_has_usa <- any(grepl("usa", list.files(paste0(dir, "/1995"))))
parquet_has_usa
#> [1] FALSE
```
<sup>Created on 2021-06-24 by the [reprex
package](https://reprex.tidyverse.org) (v2.0.0)</sup>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)