[
https://issues.apache.org/jira/browse/ARROW-12994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17379194#comment-17379194
]
Jonathan Keane commented on ARROW-12994:
----------------------------------------
It looks like we've acquired a few more of these that fail due to local
timezone issues, would be good to clean them up in this same way:
{code}
══ Failed
════════════════════════════════════════════════════════════════════════════════════════════════════════════
── 1. Failure (test-dplyr-lubridate.R:153:3): extract hour from date
─────────────────────────────────────────────────
`object` not equivalent to `expected`.
Component “x”: Mean relative difference: 1
Backtrace:
1. arrow:::expect_dplyr_equal(...) test-dplyr-lubridate.R:153:2
2. arrow:::expect_equivalent(via_batch, expected, ...)
helper-expectation.R:88:4
3. testthat::expect_equivalent(object, expected, ...) helper-expectation.R:46:2
── 2. Failure (test-dplyr-lubridate.R:153:3): extract hour from date
─────────────────────────────────────────────────
`object` not equivalent to `expected`.
Component “x”: Mean relative difference: 1
Backtrace:
1. arrow:::expect_dplyr_equal(...) test-dplyr-lubridate.R:153:2
2. arrow:::expect_equivalent(via_table, expected, ...)
helper-expectation.R:98:4
3. testthat::expect_equivalent(object, expected, ...) helper-expectation.R:46:2
── 3. Failure (test-dplyr-string-functions.R:706:3): strptime
────────────────────────────────────────────────────────
`%>%`(...) not equal to `tstamp`.
Component “x”: Mean absolute difference: 18000
{code}
> [R] stringr tests fail on non-UTC machines due to strptime defaulting to
> local timezone and Arrow defaulting to UTC
> --------------------------------------------------------------------------------------------------------------------
>
> Key: ARROW-12994
> URL: https://issues.apache.org/jira/browse/ARROW-12994
> Project: Apache Arrow
> Issue Type: Task
> Components: R
> Affects Versions: 4.0.1
> Reporter: Mauricio 'Pachá' Vargas Sepúlveda
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> Here's the problem I detected while triaging tickets.
> This was run locally after merging from apache/arrow at commit 8773b9d and
> re-building both Arrow library and Arrow R package.
> {code:r}
> library(arrow)
> #> See arrow_info() for available features
> #>
> #> Attaching package: 'arrow'
> #> The following object is masked from 'package:utils':
> #>
> #> timestamp
> library(dplyr)
> #>
> #> Attaching package: 'dplyr'
> #> The following objects are masked from 'package:stats':
> #>
> #> filter, lag
> #> The following objects are masked from 'package:base':
> #>
> #> intersect, setdiff, setequal, union
> library(testthat)
> #>
> #> Attaching package: 'testthat'
> #> The following object is masked from 'package:dplyr':
> #>
> #> matches
> #> The following object is masked from 'package:arrow':
> #>
> #> matches
> tstring <- tibble(x = c("08-05-2008", NA))
> tstamp <- tibble(x = c(strptime("08-05-2008", format = "%m-%d-%Y"), NA))
> expect_equal(
> tstring %>%
> Table$create() %>%
> mutate(
> x = strptime(x, format = "%m-%d-%Y")
> ) %>%
> collect(),
> tstamp,
> check.tzone = FALSE
> )
> #> Error: `%>%`(...) not equal to `tstamp`.
> #> Component "x": Mean absolute difference: 14400
> {code}
> We can see that the dates are different by exact 4 hours by removing the
> expectation:
> {code:r}
> library(arrow)
> #> See arrow_info() for available features
> #>
> #> Attaching package: 'arrow'
> #> The following object is masked from 'package:utils':
> #>
> #> timestamp
> library(dplyr)
> #>
> #> Attaching package: 'dplyr'
> #> The following objects are masked from 'package:stats':
> #>
> #> filter, lag
> #> The following objects are masked from 'package:base':
> #>
> #> intersect, setdiff, setequal, union
> library(testthat)
> #>
> #> Attaching package: 'testthat'
> #> The following object is masked from 'package:dplyr':
> #>
> #> matches
> #> The following object is masked from 'package:arrow':
> #>
> #> matches
> tstring <- tibble(x = c("08-05-2008", NA))
> tstamp <- tibble(x = c(strptime("08-05-2008", format = "%m-%d-%Y"), NA))
> tstring %>%
> Table$create() %>%
> mutate(
> x = strptime(x, format = "%m-%d-%Y")
> ) %>%
> collect()
> #> # A tibble: 2 x 1
> #> x
> #> <dttm>
> #> 1 2008-08-04 20:00:00
> #> 2 NA
> tstamp
> #> # A tibble: 2 x 1
> #> x
> #> <dttm>
> #> 1 2008-08-05 00:00:00
> #> 2 NA
> {code}
> _Created on 2021-06-07 by the [reprex package|https://reprex.tidyverse.org]
> (v2.0.0)_
--
This message was sent by Atlassian Jira
(v8.3.4#803005)