[
https://issues.apache.org/jira/browse/ARROW-13421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nic Crane updated ARROW-13421:
------------------------------
Description:
When reading in data where commas have been used as decimal separators (e.g.
3,141 to indicate pi), the column is read in as a character string. If I try
to specify a schema in R, i.e.:
{{read_delim_arrow("tst.csv", delim = ";", schema = schema(x = float32()))}}
I get the following error:
{{Error: Invalid: In CSV column #0: CSV conversion error to float: invalid
value 'x'
/home/nic2/arrow/cpp/src/arrow/csv/converter.cc:437 decoder_.Decode(data,
size, quoted, &value)
/home/nic2/arrow/cpp/src/arrow/csv/parser.h:84 status
/home/nic2/arrow/cpp/src/arrow/csv/converter.cc:441
parser.VisitColumn(col_index, visit) }}
Please can we have the functionality to be able to read in data from this
format as it's fairly common across a number of countries?
was:
When reading in data where commas have been used as decimal separators (e.g.
3,141 to indicate pi), the column is read in as a character string. If I try
to specify a schema in R, i.e.:
{{read_delim_arrow("tst.csv", delim = ";", schema = schema(x = float32()))}}
I get the following error:
{{
Error: Invalid: In CSV column #0: CSV conversion error to float: invalid value
'x'
/home/nic2/arrow/cpp/src/arrow/csv/converter.cc:437 decoder_.Decode(data,
size, quoted, &value)
/home/nic2/arrow/cpp/src/arrow/csv/parser.h:84 status
/home/nic2/arrow/cpp/src/arrow/csv/converter.cc:441
parser.VisitColumn(col_index, visit)
}}
Please can we have the functionality to be able to read in data from this
format as it's fairly common across a number of countries?
> [C++] Add functionality for reading in columns as floats from delimited
> files where a comma has been used as a decimal separator
> ---------------------------------------------------------------------------------------------------------------------------------
>
> Key: ARROW-13421
> URL: https://issues.apache.org/jira/browse/ARROW-13421
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Daniel Paierl
> Priority: Minor
>
> When reading in data where commas have been used as decimal separators (e.g.
> 3,141 to indicate pi), the column is read in as a character string. If I try
> to specify a schema in R, i.e.:
> {{read_delim_arrow("tst.csv", delim = ";", schema = schema(x = float32()))}}
> I get the following error:
> {{Error: Invalid: In CSV column #0: CSV conversion error to float: invalid
> value 'x'
> /home/nic2/arrow/cpp/src/arrow/csv/converter.cc:437 decoder_.Decode(data,
> size, quoted, &value)
> /home/nic2/arrow/cpp/src/arrow/csv/parser.h:84 status
> /home/nic2/arrow/cpp/src/arrow/csv/converter.cc:441
> parser.VisitColumn(col_index, visit) }}
> Please can we have the functionality to be able to read in data from this
> format as it's fairly common across a number of countries?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)