jonkeane commented on code in PR #12738:
URL: https://github.com/apache/arrow/pull/12738#discussion_r854402917


##########
r/R/dplyr-funcs-datetime.R:
##########
@@ -372,3 +372,50 @@ binding_format_datetime <- function(x, format = "", tz = 
"", usetz = FALSE) {
 
   build_expr("strftime", x, options = list(format = format, locale = 
Sys.getlocale("LC_TIME")))
 }
+
+binding_as_date <- function(x,
+                            format = NULL,
+                            tryFormats = "%Y-%m-%d",
+                            origin = "1970-01-01",
+                            tz = "UTC",
+                            base = TRUE) {
+
+  if (is.null(format) && length(tryFormats) > 1) {
+    abort("`as.Date()` with multiple `tryFormats` is not supported in Arrow")
+  }
+
+  if (call_binding("is.Date", x)) {
+    return(x)
+
+    # cast from POSIXct
+  } else if (call_binding("is.POSIXct", x)) {
+    # base::as.Date() first converts to the desired timezone and then extracts
+    # the date, which is why we need to go through timestamp() first
+    if (base || !is.null(tz)) {
+      x <- build_expr("cast", x, options = cast_options(to_type = 
timestamp(timezone = tz)))
+    }
+    # POSIXct is of type double -> we need this to prevent going down the
+    # "double" branch
+    x <- x
+
+    # cast from character
+  } else if (call_binding("is.character", x)) {
+    format <- format %||% tryFormats[[1]]
+    # unit = 0L is the identifier for seconds in valid_time32_units
+    x <- build_expr("strptime", x, options = list(format = format, unit = 0L))
+
+    # cast from numeric
+  } else if (call_binding("is.numeric", x) &
+             (!call_binding("is.integer", x) | origin != "1970-01-01")) {
+    # Arrow does not support direct casting from double to date32(), but for
+    # integer-like values we can go via int32()
+    # https://issues.apache.org/jira/browse/ARROW-15798
+    # TODO revisit if arrow decides to support double -> date casting
+    x <- build_expr("cast", x, options = cast_options(to_type = int32()))
+    delta_in_sec <- call_binding("difftime", origin, "1970-01-01")
+    delta_in_sec <- build_expr("cast", delta_in_sec, options = 
cast_options(to_type = int64()))
+    delta_in_days <- (delta_in_sec / 86400L)$cast(int32())
+    x <- build_expr("+", x, delta_in_days)

Review Comment:
   > On the Jira for the cast from duration to int32. There isn't one. I'm not 
sure how clearly documented this is, but it seems int64 is the matching type 
for quite a bit of the date-time/ duration types. We could extend the scope of 
[ARROW-15858](https://issues.apache.org/jira/browse/ARROW-15858)? Would that 
make sense?
   
   Right, it makes sense that we have the most coverage on int64, since that's 
what most of the date, time, duration types are stored as. But IMO there's no 
reason we shouldn't add to C++ int32 -> various types date, time duration types 
if int64 already works for them. 
   
   I think ARROW-15858 is slightly separate cause it's dealing with objects of 
`difftime` in R (though it is related...).  
   
   But this behavior is not great, in an ideal world, we shouldn't need to go 
from int32 -> int64 to then  go to a duration whenever we do these. A targeted 
Jira that describes that would be 💯 (and linked here so we know what we can get 
rid of when that's done)
   ```
   > int32 <- Scalar$create(1, type=int32())
   > int32$cast(duration())
   Error: NotImplemented: Unsupported cast from int32 to duration using 
function cast_duration
   > int64 <- Scalar$create(1, type=int64())
   > int64$cast(duration())
   Scalar
   1
   ```
   
   Tangentially: We might want to clean up the language on that ticket too, 
mentioning floats in there might be confusing since we already know that in 
Arrow going from float -> duration is not working (yet?):
   
   ```
   > float <- Scalar$create(1, type=float())
   > type(float)
   Float32
   float
   > float$cast(duration())
   Error: NotImplemented: Unsupported cast from float to duration using 
function cast_duration
   ```
   
   
   > Makes sense. I created 
[ARROW-16257](https://issues.apache.org/jira/browse/ARROW-16257) for breaking 
it up in individual functions. I'd like to get this merged in time for 8.0.0 
release and then we can follow-up to break each piece into an individual 
function. What do you think?
   
   I won't demand that we do the refactoring now — but we do have to get the 
behavior right, and I suspect that refactoring it now will make clearer more 
easily what the issues are and how to fix them. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to