[
https://issues.apache.org/jira/browse/ARROW-15123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated ARROW-15123:
-----------------------------------
Labels: pull-request-available schema (was: schema)
> [R] Schema order not respected and file header ignored
> ------------------------------------------------------
>
> Key: ARROW-15123
> URL: https://issues.apache.org/jira/browse/ARROW-15123
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Affects Versions: 6.0.0, 6.0.1
> Reporter: N D
> Assignee: Nicola Crane
> Priority: Major
> Labels: pull-request-available, schema
> Attachments: reprex-arrow-6-read.tar.gz
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> In `arrow` 6.0.0+ for R, when I read in a CSV file using a schema where the
> order of the columns in the schema doesn't match the order of columns in the
> CSV, the data is read in incorrectly.
> The header is included as an observation in the read-in dataset. The columns
> are renamed *but not reordered* to match the schema. So I end up with the
> "quantile" column called "location", etc, as below.
> {code:java}
> [1] "last few obs in sorted order with arrow"
> # A tibble: 6 × 7
> forecast_date target target_end_date location type quantile
> value
> <chr> <chr> <chr> <chr> <chr> <chr>
> <chr>
> 1 2021-12-12 9 day ahead… 2021-12-21 0.99 946.43313… 06
> quant…
> 2 2021-12-12 9 day ahead… 2021-12-21 0.99 956.43294… 39
> quant…
> 3 2021-12-12 9 day ahead… 2021-12-21 0.99 97.948144… 41
> quant…
> 4 2021-12-12 9 day ahead… 2021-12-21 0.99 98.573545… 49
> quant…
> 5 2021-12-12 9 day ahead… 2021-12-21 0.99 98.978636… 33
> quant…
> 6 forecast_date target target_end_date quantile value location
> type {code}
> The last line ("forecast_date target...") is the original header.
> The file in question
> ([https://raw.githubusercontent.com/reichlab/covid19-forecast-hub/master/data-processed/JHUAPL-Gecko/2021-12-12-JHUAPL-Gecko.csv)]
> has 45360 observations + 1 line for the header. But the read-in dataset has
> {code:java}
> [1] "dimensions with arrow"
> [1] 45361 7 {code}
> Reprex attached with working (`packageVersion("arrow") == 4.0.1`; 5.0.0 also
> works) and non-working (`packageVersion("arrow") == 6.0.1`) examples. Run
> examples using `make run-broken` and `make run-works`.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)