[
https://issues.apache.org/jira/browse/ARROW-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384873#comment-17384873
]
Daniel Paierl commented on ARROW-13421:
---------------------------------------
Hi [~thisisnic], thanks for the super fast reply! Sorry I forgot the repex, its
easy to forget how insular these "," vs. "." problems are when european comma
and thousand separators are standard here. Sadly, I cannot change the format of
the source data, even using .parquet files is a major departure from what has
been done in the past.
Without further ado:
h2. Repex
{code:r}
set.seed(1)
# random values
tbl <- tibble::tibble(x = rnorm(5))
tbl
## # A tibble: 5 x 1
## x
## <dbl>
## 1 -0.626
## 2 0.184
## 3 -0.836
## 4 1.60
## 5 0.330
# write to file in european format (separator = ";", decimal mark = ".")
readr::write_csv2(tbl, here::here("01_proc_data/arrow_repex.csv"))
# read in with delim set to ";"
arrow::read_delim_arrow(file = here::here("01_proc_data/arrow_repex.csv"),
delim = ";")
## # A tibble: 5 x 1
## x
## <chr>
## 1 -0,626453810742332
## 2 0,183643324222082
## 3 -0,835628612410047
## 4 1,595280802137792
## 5 0,329507771815361
{code}
h3. Session Info
{code:r}
R version 4.0.5 (2021-03-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server x64 (build 14393)
Matrix products: default
locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] arrow_4.0.1 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.6
[5] purrr_0.3.4 readr_1.4.0 tidyr_1.1.3 tibble_3.1.1
[9] ggplot2_3.3.3 tidyverse_1.3.1
{code}
> [R] Add choice for decimal marker in read_delim_arrow
> -----------------------------------------------------
>
> Key: ARROW-13421
> URL: https://issues.apache.org/jira/browse/ARROW-13421
> Project: Apache Arrow
> Issue Type: Improvement
> Components: R
> Affects Versions: 4.0.1
> Reporter: Daniel Paierl
> Priority: Minor
> Labels: R
>
> In the R arrow package read_delim_arrow lacks the option to specify the
> decimal marker (e.g. comma or point) in the parsing options.
> This is a major inconvenience for data with a _point_ as a decimal marker
> (european users) since the data is read in as astring which requires post-hoc
> conversion of the string to double.
>
> Request: Add a parsing option to set the decimal marker if that is possible.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)