[ 
https://issues.apache.org/jira/browse/ARROW-17424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SHIMA Tatsuya updated ARROW-17424:
----------------------------------
    Description: 
I believe the {{POSIXct}} type or R currently corresponds to the Arrow 
{{timestamp[us, tz=UTC]}} type.

{code:r}
lubridate::as_datetime(0) |> arrow::infer_type()
#> Timestamp
#> timestamp[us, tz=UTC]
{code}

{code:r}
lubridate::as_datetime("1970-01-01 00:00:00.0000001") |>
  arrow::arrow_table(x = _)
#> Table
#> 1 rows x 1 columns
#> $x <timestamp[us, tz=UTC]>
{code}

{code:r}
df_a <- lubridate::as_datetime("1970-01-01 00:00:00.0000001") |>
  arrow::arrow_table(x = _) |>
  as.data.frame()

df_b <- lubridate::as_datetime("1970-01-01 00:00:00.0000001") |>
  tibble::tibble(x = _)

waldo::compare(df_a, df_b)
#> `old$x`: "1970-01-01"
#> `new$x`: "1970-01-01 00:00:00"
{code}

However, as shown below, POSIXct may hold data finer than a microsecond.

{code:r}
lubridate::as_datetime(0.000000001) |> as.numeric()
#> [1] 1e-09
lubridate::as_datetime("1970-01-01 00:00:00.0000001") |> as.numeric()
#> [1] 1.192093e-07
{code}

I don't know why it is currently set in microseconds, but is there any reason 
not to set it in nanoseconds?

  was:
I believe the {{POSIXct}} type or R currently corresponds to the Arrow 
{{timestamp[us, tz=UTC]}} type.

{code:r}
lubridate::as_datetime(0) |> arrow::infer_type()
#> Timestamp
#> timestamp[us, tz=UTC]
{code}

{code:r}
df_a <- lubridate::as_datetime("1970-01-01 00:00:00.0000001") |>
  arrow::arrow_table(x = _) |>
  as.data.frame()

df_b <- lubridate::as_datetime("1970-01-01 00:00:00.0000001") |>
  tibble::tibble(x = _)

waldo::compare(df_a, df_b)
#> `old$x`: "1970-01-01"
#> `new$x`: "1970-01-01 00:00:00"
{code}

However, as shown below, POSIXct may hold data finer than a microsecond.

{code:r}
lubridate::as_datetime(0.000000001) |> as.numeric()
#> [1] 1e-09
lubridate::as_datetime("1970-01-01 00:00:00.0000001") |> as.numeric()
#> [1] 1.192093e-07
{code}

I don't know why it is currently set in microseconds, but is there any reason 
not to set it in nanoseconds?


> [R] Microsecond is not sufficient unit for POSIXct
> --------------------------------------------------
>
>                 Key: ARROW-17424
>                 URL: https://issues.apache.org/jira/browse/ARROW-17424
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>    Affects Versions: 9.0.0
>            Reporter: SHIMA Tatsuya
>            Priority: Major
>
> I believe the {{POSIXct}} type or R currently corresponds to the Arrow 
> {{timestamp[us, tz=UTC]}} type.
> {code:r}
> lubridate::as_datetime(0) |> arrow::infer_type()
> #> Timestamp
> #> timestamp[us, tz=UTC]
> {code}
> {code:r}
> lubridate::as_datetime("1970-01-01 00:00:00.0000001") |>
>   arrow::arrow_table(x = _)
> #> Table
> #> 1 rows x 1 columns
> #> $x <timestamp[us, tz=UTC]>
> {code}
> {code:r}
> df_a <- lubridate::as_datetime("1970-01-01 00:00:00.0000001") |>
>   arrow::arrow_table(x = _) |>
>   as.data.frame()
> df_b <- lubridate::as_datetime("1970-01-01 00:00:00.0000001") |>
>   tibble::tibble(x = _)
> waldo::compare(df_a, df_b)
> #> `old$x`: "1970-01-01"
> #> `new$x`: "1970-01-01 00:00:00"
> {code}
> However, as shown below, POSIXct may hold data finer than a microsecond.
> {code:r}
> lubridate::as_datetime(0.000000001) |> as.numeric()
> #> [1] 1e-09
> lubridate::as_datetime("1970-01-01 00:00:00.0000001") |> as.numeric()
> #> [1] 1.192093e-07
> {code}
> I don't know why it is currently set in microseconds, but is there any reason 
> not to set it in nanoseconds?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to