[
https://issues.apache.org/jira/browse/ARROW-17241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646213#comment-17646213
]
Dewey Dunnington commented on ARROW-17241:
------------------------------------------
As Antoine noted, the pattern here would be to read the numbers in as
{{double}} and then later cast to integer. I could see how the CSV reader may
incorrectly guess that the column is an integer type if the scientific notation
is not in the rows that the CSV reader uses to guess; however, the current
error message does seem to provide enough information to fix the error.
> [R] Support scientific notation for integers in csv reader
> ----------------------------------------------------------
>
> Key: ARROW-17241
> URL: https://issues.apache.org/jira/browse/ARROW-17241
> Project: Apache Arrow
> Issue Type: New Feature
> Components: R
> Environment: arrow R package 8.0.0
> Reporter: Hugo Gruson
> Priority: Minor
>
> It looks like the csv reader doesn't support scientific notation for
> integers, as shown in the following reprex. However, it works fine for
> floats/doubles.
> Could support for scientific notation for integers be added please?
>
> {noformat}
> testcsv <- tempfile(fileext = ".csv")
> c(1, 2, 1e6) |>
> as.data.frame() |>
> setNames("int") |>
> write.csv(testcsv, row.names = FALSE)
> arrow::read_csv_arrow(testcsv, col_types = "i", col_names = "int", skip = 1)
> #> Error:
> #> ! Invalid: In CSV column #0: CSV conversion error to int32: invalid value
> '1e+06'
> #> Backtrace:
> #> ▆
> #> 1. └─arrow (local) `<fn>`(...)
> #> 2. └─base::tryCatch(...)
> #> 3. └─base (local) tryCatchList(expr, classes, parentenv, handlers)
> #> 4. └─base (local) tryCatchOne(expr, names, parentenv,
> handlers[[1L]])
> #> 5. └─value[[3L]](cond)
> #> 6. └─arrow:::handle_csv_read_error(e, schema, call)
> #> 7. └─rlang::abort(msg, call = call)
> arrow::read_csv_arrow(testcsv, col_types = "d", col_names = "int", skip = 1)
> #> # A tibble: 3 × 1
> #> int
> #> <dbl>
> #> 1 1
> #> 2 2
> #> 3 1000000
> {noformat}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)