[ 
https://issues.apache.org/jira/browse/ARROW-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nic Crane updated ARROW-13421:
------------------------------
    Description: 
When reading in data where commas have been used as decimal separators (e.g. 
3,141 to indicate pi), the column is read in as a character string.  If I try 
to specify a schema in R, i.e.:

{{read_delim_arrow("tst.csv", delim = ";", schema = schema(x = float32()))}}

I get the following error:

{{ 
Error: Invalid: In CSV column #0: CSV conversion error to float: invalid value 
'x'
/home/nic2/arrow/cpp/src/arrow/csv/converter.cc:437  decoder_.Decode(data, 
size, quoted, &value)
/home/nic2/arrow/cpp/src/arrow/csv/parser.h:84  status
/home/nic2/arrow/cpp/src/arrow/csv/converter.cc:441  
parser.VisitColumn(col_index, visit) 
}}

Please can we have the functionality to be able to read in data from this 
format as it's fairly common across a number of countries?

  was:
In the R arrow package read_delim_arrow lacks the option to specify the decimal 
marker (e.g. comma or point) in the parsing options.

This is a major inconvenience for data with a _point_ as a decimal marker 
(european users) since the data is read in as astring which requires post-hoc 
conversion of the string to double. 

Request: Add a parsing option to set the decimal marker if that is possible.

(i.e. if the source data uses commas to demark the decimal place, this feature 
request wants to be able to process such files as if a period had been used 
instead)


> [C++]  Add functionality for reading in columns as floats from delimited 
> files where a comma has been used as a decimal separator
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-13421
>                 URL: https://issues.apache.org/jira/browse/ARROW-13421
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Daniel Paierl
>            Priority: Minor
>
> When reading in data where commas have been used as decimal separators (e.g. 
> 3,141 to indicate pi), the column is read in as a character string.  If I try 
> to specify a schema in R, i.e.:
> {{read_delim_arrow("tst.csv", delim = ";", schema = schema(x = float32()))}}
> I get the following error:
> {{ 
> Error: Invalid: In CSV column #0: CSV conversion error to float: invalid 
> value 'x'
> /home/nic2/arrow/cpp/src/arrow/csv/converter.cc:437  decoder_.Decode(data, 
> size, quoted, &value)
> /home/nic2/arrow/cpp/src/arrow/csv/parser.h:84  status
> /home/nic2/arrow/cpp/src/arrow/csv/converter.cc:441  
> parser.VisitColumn(col_index, visit) 
> }}
> Please can we have the functionality to be able to read in data from this 
> format as it's fairly common across a number of countries?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to