[ 
https://issues.apache.org/jira/browse/ARROW-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17415686#comment-17415686
 ] 

Joris Van den Bossche commented on ARROW-13938:
-----------------------------------------------

>From my experience with pandas, fully agreed that autocasting strings as 
>timestamps in operations is a very nice convenience. 
And I think there are arguments for having timestamp be an exception here (as 
we indeed should not autocast to integers or floats), since you don't have a 
"native" scalar object like you have for ints, floats and strings, i.e. you 
need to create an object (eg calling {{datetime.datetime()}}, 
{{datetime.datetime.strptime()}} or {{pd.Timestamp()}} in Python). (although 
now writing this, the same could maybe be said for decimals ..)

I am only wondering a bit at what level this casting should be implemented. It 
might also be possible to let this be a responsibility of "user API" level 
(i.e. part of the R / Python bindings)? Or would that be difficult to implement 
at that level?

> [C++] Date and datetime types should autocast from strings
> ----------------------------------------------------------
>
>                 Key: ARROW-13938
>                 URL: https://issues.apache.org/jira/browse/ARROW-13938
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Jonathan Keane
>            Priority: Major
>
> When comparing dates and datetimes, people frequently expect that a string 
> (formatted as ISO8601) will auto-cast and compare to dates and times.
> Examples in R:
> {code:r}
> library(arrow)
> #> 
> #> Attaching package: 'arrow'
> #> The following object is masked from 'package:utils':
> #> 
> #>     timestamp
> arr <- Array$create(as.Date(c("1974-04-06", "1988-05-09")))
> arr > "1980-01-01"
> #> Error: NotImplemented: Function greater has no kernel matching input types 
> (array[date32[day]], scalar[string])
> # creating the scalar as a date works, of course
> arr > Scalar$create(as.Date("1980-01-01"))
> #> Array
> #> <bool>
> #> [
> #>   false,
> #>   true
> #> ]
> # datatimes also do not auto-cast
> arr <- Array$create(Sys.time())
> arr > "1980-01-01 00:00:00"
> #> Error: NotImplemented: Function greater has no kernel matching input types 
> (array[timestamp[us]], scalar[string])
> # or a more real-world example
> library(dplyr)
> #> 
> #> Attaching package: 'dplyr'
> #> The following objects are masked from 'package:stats':
> #> 
> #>     filter, lag
> #> The following objects are masked from 'package:base':
> #> 
> #>     intersect, setdiff, setequal, union
> mtcars$date <- as.Date(c("1974-04-06", "1988-05-09"))
> ds <- InMemoryDataset$create(mtcars)
> ds %>%
>   filter(date > "1980-01-01") %>%
>   collect()
> #> Error: NotImplemented: Function greater has no kernel matching input types 
> (array[date32[day]], scalar[string])
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to