[
https://issues.apache.org/jira/browse/ARROW-17424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
SHIMA Tatsuya updated ARROW-17424:
----------------------------------
Description:
I believe the {{POSIXct}} type or R currently corresponds to the Arrow
{{timestamp[us, tz=UTC]}} type.
{code:r}
lubridate::as_datetime(0) |> arrow::infer_type()
#> Timestamp
#> timestamp[us, tz=UTC]
{code}
{code:r}
lubridate::as_datetime("1970-01-01 00:00:00.0000001") |>
arrow::arrow_table(x = _)
#> Table
#> 1 rows x 1 columns
#> $x <timestamp[us, tz=UTC]>
{code}
{code:r}
df_a <- lubridate::as_datetime("1970-01-01 00:00:00.0000001") |>
arrow::arrow_table(x = _) |>
as.data.frame()
df_b <- lubridate::as_datetime("1970-01-01 00:00:00.0000001") |>
tibble::tibble(x = _)
waldo::compare(df_a, df_b)
#> `old$x`: "1970-01-01"
#> `new$x`: "1970-01-01 00:00:00"
{code}
However, as shown below, POSIXct may hold data finer than a microsecond.
{code:r}
lubridate::as_datetime(0.000000001) |> as.numeric()
#> [1] 1e-09
lubridate::as_datetime("1970-01-01 00:00:00.0000001") |> as.numeric()
#> [1] 1.192093e-07
{code}
I don't know why it is currently set in microseconds, but is there any reason
not to set it in nanoseconds?
was:
I believe the {{POSIXct}} type or R currently corresponds to the Arrow
{{timestamp[us, tz=UTC]}} type.
{code:r}
lubridate::as_datetime(0) |> arrow::infer_type()
#> Timestamp
#> timestamp[us, tz=UTC]
{code}
{code:r}
df_a <- lubridate::as_datetime("1970-01-01 00:00:00.0000001") |>
arrow::arrow_table(x = _) |>
as.data.frame()
df_b <- lubridate::as_datetime("1970-01-01 00:00:00.0000001") |>
tibble::tibble(x = _)
waldo::compare(df_a, df_b)
#> `old$x`: "1970-01-01"
#> `new$x`: "1970-01-01 00:00:00"
{code}
However, as shown below, POSIXct may hold data finer than a microsecond.
{code:r}
lubridate::as_datetime(0.000000001) |> as.numeric()
#> [1] 1e-09
lubridate::as_datetime("1970-01-01 00:00:00.0000001") |> as.numeric()
#> [1] 1.192093e-07
{code}
I don't know why it is currently set in microseconds, but is there any reason
not to set it in nanoseconds?
> [R] Microsecond is not sufficient unit for POSIXct
> --------------------------------------------------
>
> Key: ARROW-17424
> URL: https://issues.apache.org/jira/browse/ARROW-17424
> Project: Apache Arrow
> Issue Type: Improvement
> Components: R
> Affects Versions: 9.0.0
> Reporter: SHIMA Tatsuya
> Priority: Major
>
> I believe the {{POSIXct}} type or R currently corresponds to the Arrow
> {{timestamp[us, tz=UTC]}} type.
> {code:r}
> lubridate::as_datetime(0) |> arrow::infer_type()
> #> Timestamp
> #> timestamp[us, tz=UTC]
> {code}
> {code:r}
> lubridate::as_datetime("1970-01-01 00:00:00.0000001") |>
> arrow::arrow_table(x = _)
> #> Table
> #> 1 rows x 1 columns
> #> $x <timestamp[us, tz=UTC]>
> {code}
> {code:r}
> df_a <- lubridate::as_datetime("1970-01-01 00:00:00.0000001") |>
> arrow::arrow_table(x = _) |>
> as.data.frame()
> df_b <- lubridate::as_datetime("1970-01-01 00:00:00.0000001") |>
> tibble::tibble(x = _)
> waldo::compare(df_a, df_b)
> #> `old$x`: "1970-01-01"
> #> `new$x`: "1970-01-01 00:00:00"
> {code}
> However, as shown below, POSIXct may hold data finer than a microsecond.
> {code:r}
> lubridate::as_datetime(0.000000001) |> as.numeric()
> #> [1] 1e-09
> lubridate::as_datetime("1970-01-01 00:00:00.0000001") |> as.numeric()
> #> [1] 1.192093e-07
> {code}
> I don't know why it is currently set in microseconds, but is there any reason
> not to set it in nanoseconds?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)