[
https://issues.apache.org/jira/browse/ARROW-17637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600960#comment-17600960
]
Neal Richardson commented on ARROW-17637:
-----------------------------------------
The naive cast to date32() works:
{code}
> Array$create(lubridate::as_datetime('2022-05-05T00:00:01.676632'))
Array
<timestamp[us, tz=UTC]>
[
2022-05-05 00:00:01.676632
]
> Array$create(lubridate::as_datetime('2022-05-05T00:00:01.676632'))$cast(date32())
Array
<date32[day]>
[
2022-05-05
]
{code}
The issue looks to be in this extra cast, something about handling timezones:
https://github.com/apache/arrow/blob/master/r/R/dplyr-funcs-datetime.R#L329
Basically, if x is timestamp type, we either need to keep the same unit from x
(it's a parameter to the type, default is "s", hence the error), or pass the
right cast option to allow truncation. (And probably not cast at all if it's
already the same timezone.)
> [R] as.Date fails going from timestamp[us] to timestamp[s]
> ----------------------------------------------------------
>
> Key: ARROW-17637
> URL: https://issues.apache.org/jira/browse/ARROW-17637
> Project: Apache Arrow
> Issue Type: Improvement
> Components: R
> Reporter: Nicola Crane
> Priority: Major
>
> Using as.Date to convert from timestamp to date fails in Arrow even though
> this is fine in R.
> {code:r}
> library(arrow)
> library(dplyr)
> library(lubridate)
> tf <- tempfile()
> dir.create(tf)
> tbl <- tibble::tibble(x = as_datetime('2022-05-05T00:00:01.676632'))
> write_dataset(tbl, tf)
> open_dataset(tf) %>%
> mutate(date = as.Date(x)) %>%
> collect()
> #> Error in `collect()`:
> #> ! Invalid: Casting from timestamp[us, tz=UTC] to timestamp[s, tz=UTC]
> would lose data: 1651708801676632
> #> /home/nic2/arrow/cpp/src/arrow/compute/exec.cc:799
> kernel_->exec(kernel_ctx_, input, out)
> #> /home/nic2/arrow/cpp/src/arrow/compute/exec.cc:767
> ExecuteSingleSpan(input, &output)
> #> /home/nic2/arrow/cpp/src/arrow/compute/exec/expression.cc:597
> executor->Execute( ExecBatch(std::move(arguments), all_scalar ? 1 :
> input.length), &listener)
> #> /home/nic2/arrow/cpp/src/arrow/compute/exec/expression.cc:579
> ExecuteScalarExpression(call->arguments[i], input, exec_context)
> #> /home/nic2/arrow/cpp/src/arrow/compute/exec/project_node.cc:91
> ExecuteScalarExpression(simplified_expr, target, plan()->exec_context())
> #> /home/nic2/arrow/cpp/src/arrow/compute/exec/exec_plan.cc:573
> iterator_.Next()
> #> /home/nic2/arrow/cpp/src/arrow/record_batch.cc:337 ReadNext(&batch)
> #> /home/nic2/arrow/cpp/src/arrow/record_batch.cc:351 ToRecordBatches()
> tbl %>%
> mutate(date = as.Date(x))
> #> # A tibble: 1 × 2
> #> x date
> #> <dttm> <date>
> #> 1 2022-05-05 00:00:01 2022-05-05
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)