[
https://issues.apache.org/jira/browse/ARROW-13887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nic Crane updated ARROW-13887:
------------------------------
Description:
When reading in a CSV with headers, and also using a schema, we get an error as
the code tries to read in the header as a line of data.
{code:java}
share_data <- tibble::tibble(
company = c("AMZN", "GOOG", "BKNG", "TSLA"),
price = c(3463.12, 2884.38, 2300.46, 732.39),
date = rep(as.Date("2021-09-03"), 4)
)
readr::write_csv(share_data, file = "share_data.csv")
share_schema <- schema(
company = utf8(),
price = float64(),
date = date32()
)
read_csv_arrow("share_data.csv", schema = share_schema)
{code}
{code:java}
Error: Invalid: In CSV column #1: CSV conversion error to double: invalid value
'price'
/home/nic2/arrow/cpp/src/arrow/csv/converter.cc:492 decoder_.Decode(data, size,
quoted, &value)
/home/nic2/arrow/cpp/src/arrow/csv/parser.h:84 status
/home/nic2/arrow/cpp/src/arrow/csv/converter.cc:496
parser.VisitColumn(col_index, visit) {code}
was:
While reporting another bug, I found an error while working with schemas. It's
not just this particular data type - try changing around the various data types
specified and similar errors occur. Unsure if this is at the R or C++ layer
{code:java}
share_data <- tibble::tibble(
company = c("AMZN", "GOOG", "BKNG", "TSLA"),
price = c(3463.12, 2884.38, 2300.46, 732.39),
date = rep(as.Date("2021-09-03"), 4)
)
readr::write_csv(share_data, file = "share_data.csv")
share_schema <- schema(
company = utf8(),
price = float64(),
date = date32()
)
read_csv_arrow("share_data.csv", schema = share_schema)
{code}
{code:java}
Error: Invalid: In CSV column #1: CSV conversion error to double: invalid value
'price'
/home/nic2/arrow/cpp/src/arrow/csv/converter.cc:492 decoder_.Decode(data, size,
quoted, &value)
/home/nic2/arrow/cpp/src/arrow/csv/parser.h:84 status
/home/nic2/arrow/cpp/src/arrow/csv/converter.cc:496
parser.VisitColumn(col_index, visit) {code}
> [R] Error using schemas when reading in CSV file with headers
> -------------------------------------------------------------
>
> Key: ARROW-13887
> URL: https://issues.apache.org/jira/browse/ARROW-13887
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Reporter: Nic Crane
> Priority: Major
>
> When reading in a CSV with headers, and also using a schema, we get an error
> as the code tries to read in the header as a line of data.
> {code:java}
> share_data <- tibble::tibble(
> company = c("AMZN", "GOOG", "BKNG", "TSLA"),
> price = c(3463.12, 2884.38, 2300.46, 732.39),
> date = rep(as.Date("2021-09-03"), 4)
> )
> readr::write_csv(share_data, file = "share_data.csv")
> share_schema <- schema(
> company = utf8(),
> price = float64(),
> date = date32()
> )
> read_csv_arrow("share_data.csv", schema = share_schema)
> {code}
> {code:java}
> Error: Invalid: In CSV column #1: CSV conversion error to double: invalid
> value 'price'
> /home/nic2/arrow/cpp/src/arrow/csv/converter.cc:492 decoder_.Decode(data,
> size, quoted, &value)
> /home/nic2/arrow/cpp/src/arrow/csv/parser.h:84 status
> /home/nic2/arrow/cpp/src/arrow/csv/converter.cc:496
> parser.VisitColumn(col_index, visit) {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)