jonkeane commented on a change in pull request #12433:
URL: https://github.com/apache/arrow/pull/12433#discussion_r817906840



##########
File path: r/R/dplyr-funcs-type.R
##########
@@ -76,6 +76,60 @@ register_bindings_type_cast <- function() {
   register_binding("as.numeric", function(x) {
     Expression$create("cast", x, options = cast_options(to_type = float64()))
   })
+  register_binding("as.Date", function(x,
+                                       format = NULL,
+                                       tryFormats = "%Y-%m-%d",
+                                       origin = "1970-01-01",
+                                       tz = "UTC") {
+
+    if (call_binding("is.Date", x)) {
+      # base::as.Date() first converts to the desired timezone and then 
extracts
+      # the date, which is why we need to go through timestamp() first
+      return(x)
+
+    # cast from POSIXct
+    } else if (call_binding("is.POSIXct", x)) {
+      if (tz == "UTC") {
+        x <- build_expr("cast", x, options = cast_options(to_type = 
timestamp(timezone = tz)))
+      } else {
+        abort("`as.Date()` with a timezone different to 'UTC' is not supported 
in Arrow")
+      }
+
+    # cast from character
+    } else if (call_binding("is.character", x)) {
+      # this could be improved with tryFormats once strptime returns NA and we
+      # can use coalesce - https://issues.apache.org/jira/browse/ARROW-15659
+      # TODO revisit once https://issues.apache.org/jira/browse/ARROW-15659 is 
done
+      if (is.null(format)) {
+        if (length(tryFormats) == 1) {
+          format <- tryFormats[1]
+        } else {
+          abort("`as.Date()` with multiple `tryFormats` is not supported in 
Arrow yet")
+        }
+      }
+      x <- build_expr("strptime", x, options = list(format = format, unit = 
0L))

Review comment:
       _nods_ I totally understand why the structure happened. In many cases it 
would be NBD to speculatively put in structure like this, but here I found it 
quite hard to reason about what was going on when and why because of the 
nesting. One example: I honestly hadn't totally put together that the check on 
`origin` only happened for numerics until you explicitly said that(!).
   
   Another option would be to break all of these into helper functions where we 
encapsulate all of the logic for each type and can go and inspect that as 
needed. But that's probably more structure than we need right now — and who 
knows if the C++ and other tickets will actually be resolves (or resolves in 
ways that would fit with this structure).
   
   This is admittedly a style point — I'm not going to block this from merging 
solely because of that, but I have found the nested if/elses rather difficult 
to review + keep all of the choices in my head while doing so.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to