jonkeane commented on a change in pull request #12433:
URL: https://github.com/apache/arrow/pull/12433#discussion_r817900666



##########
File path: r/R/dplyr-funcs-type.R
##########
@@ -76,6 +76,60 @@ register_bindings_type_cast <- function() {
   register_binding("as.numeric", function(x) {
     Expression$create("cast", x, options = cast_options(to_type = float64()))
   })
+  register_binding("as.Date", function(x,
+                                       format = NULL,
+                                       tryFormats = "%Y-%m-%d",
+                                       origin = "1970-01-01",
+                                       tz = "UTC") {
+
+    if (call_binding("is.Date", x)) {
+      # base::as.Date() first converts to the desired timezone and then 
extracts
+      # the date, which is why we need to go through timestamp() first
+      return(x)
+
+    # cast from POSIXct
+    } else if (call_binding("is.POSIXct", x)) {
+      if (tz == "UTC") {
+        x <- build_expr("cast", x, options = cast_options(to_type = 
timestamp(timezone = tz)))
+      } else {
+        abort("`as.Date()` with a timezone different to 'UTC' is not supported 
in Arrow")
+      }

Review comment:
       My point wasn't about where the arg check was happening here, my point 
was: I don't think you need to do the arg check at all. I believe we have all 
the machinery in Arrow to support timezones other than UTC both in the input 
and as an argument here. 
   
   This is a more explicit version of above, but the same general principle: 
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   library(dplyr, warn.conflicts = FALSE)
   
   df <- data.frame(time = as.POSIXct("2020-01-01 23:30:00", tz = 
"America/Chicago"))
   
   df %>% 
     mutate(
       as_date = as.Date(time),
       as_date_nyc = as.Date(time, tz = "America/New_York"),
       as_date_chi = as.Date(time, tz = "America/Chicago"),
       as_date_lax = as.Date(time, tz = "America/Los_Angeles")
     ) %>%
     collect()
   #>                  time    as_date as_date_nyc as_date_chi as_date_lax
   #> 1 2020-01-01 23:30:00 2020-01-02  2020-01-02  2020-01-01  2020-01-01
   
   df %>% 
     arrow_table() %>%
     mutate(
       as_date = cast(cast(time, timestamp(timezone = "UTC")), date32()),
       as_date_nyc = cast(cast(time, timestamp(timezone = "America/New_York")), 
date32()),
       as_date_chi = cast(cast(time, timestamp(timezone = "America/Chicago")), 
date32()),
       as_date_lax = cast(cast(time, timestamp(timezone = 
"America/Los_Angeles")), date32())
     ) %>%
     collect()
   #>                  time    as_date as_date_nyc as_date_chi as_date_lax
   #> 1 2020-01-01 23:30:00 2020-01-02  2020-01-02  2020-01-01  2020-01-01
   ```
    
    
    Am I missing something here? Or misunderstanding the issue?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to