jonkeane commented on a change in pull request #12433: URL: https://github.com/apache/arrow/pull/12433#discussion_r816810668
########## File path: r/R/dplyr-funcs-type.R ########## @@ -76,6 +76,64 @@ register_bindings_type_cast <- function() { register_binding("as.numeric", function(x) { Expression$create("cast", x, options = cast_options(to_type = float64())) }) + register_binding("as.Date", function(x, + format = NULL, + tryFormats = "%Y-%m-%d", + origin = "1970-01-01", + tz = "UTC") { + + if (call_binding("is.Date", x)) { + # base::as.Date() first converts to the desired timezone and then extracts + # the date, which is why we need to go through timestamp() first + return(x) + + # cast from POSIXct + } else if (call_binding("is.POSIXct", x)) { + if (tz == "UTC") { + interim_x <- build_expr("cast", x, options = cast_options(to_type = timestamp(timezone = tz))) + } else { + abort("`as.Date()` with a timezone different to 'UTC' is not supported in Arrow") + } + + # cast from character + } else if (call_binding("is.character", x)) { + # this could be improved with tryFormats once strptime returns NA and we + # can use coalesce - https://issues.apache.org/jira/browse/ARROW-15659 + # TODO revisit once https://issues.apache.org/jira/browse/ARROW-15659 is done + if (is.null(format)) { + if (length(tryFormats) == 1) { + format <- tryFormats[1] + } else { + abort("`as.Date()` with multiple `tryFormats` is not supported in Arrow yet") + } + } + # if x is not an expression (e.g. passed as filter), convert it to one + if (!inherits(x, "Expression")) { + x <- build_expr("cast", x, options = cast_options(to_type = type(x))) + } + interim_x <- call_binding("strptime", x, format, unit = "s") + + # cast from numeric + } else if (call_binding("is.numeric", x)) { + # the origin argument will be better supported once we implement temporal + # arithmetic (https://issues.apache.org/jira/browse/ARROW-14947) + # TODO revisit once the above has been sorted + if (!call_binding("is.integer", x)) { + # Arrow does not support direct casting from double to date so we have + # to convert to integers first - casting to int32() would error so we + # need to use round before casting Review comment: Yeah, `safe` was what I was thinking. Like I said in my comment: I think `floor` is good for here since that is how dates from timestamps work anyway (they round down to the next whole unit, until the next one and don't start rounding up at noon or something). The unsafe cast rounds towards zero, so for negative numbers you are effectively getting `ceiling`. I suspect in reality no one will ever run into this (and I agree that arrow should support `float -> date`!), but on the off chance that someone does, I think `floor` here is the right way to go (maybe we should also add a comment that explains this reasoning so it's out in the open: We need to go from float -> int, and we want the behavior of `floor` too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org