[ 
https://issues.apache.org/jira/browse/ARROW-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17412165#comment-17412165
 ] 

Jonathan Keane commented on ARROW-13938:
----------------------------------------

In general I agree that lots of string autocasts might be a source of more 
trouble than it's worth,  Though dates/datetimes are special enough that we 
should consider autocasting them. R is not the only case where that works, many 
flavors of SQL do this, and pandas does too. Not that we should follow past 
decisions if we think they are wrong, but IMO doing this for dates/datetimes is 
beneficial. 

{code}
>>> import pandas as pd
>>> df = pd.DataFrame({'date' : ['2020-08-09', '2020-08-25', '2020-09-05', 
...                              '2020-09-12', '2020-09-29', '2020-10-15', 
...                              '2020-11-21', '2020-12-02', '2020-12-10', 
...                              '2020-12-18']})
>>> 
>>> df['date'] = pd.to_datetime(df['date'])
>>> df["date"] > "2020-10-01"
0    False
1    False
2    False
3    False
4    False
5     True
6     True
7     True
8     True
9     True
Name: date, dtype: bool
{code}

> [C++] Date and datetime types should autocast from strings
> ----------------------------------------------------------
>
>                 Key: ARROW-13938
>                 URL: https://issues.apache.org/jira/browse/ARROW-13938
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Jonathan Keane
>            Priority: Major
>
> When comparing dates and datetimes, people frequently expect that a string 
> (formatted as ISO8601) will auto-cast and compare to dates and times.
> Examples in R:
> {code:r}
> library(arrow)
> #> 
> #> Attaching package: 'arrow'
> #> The following object is masked from 'package:utils':
> #> 
> #>     timestamp
> arr <- Array$create(as.Date(c("1974-04-06", "1988-05-09")))
> arr > "1980-01-01"
> #> Error: NotImplemented: Function greater has no kernel matching input types 
> (array[date32[day]], scalar[string])
> # creating the scalar as a date works, of course
> arr > Scalar$create(as.Date("1980-01-01"))
> #> Array
> #> <bool>
> #> [
> #>   false,
> #>   true
> #> ]
> # datatimes also do not auto-cast
> arr <- Array$create(Sys.time())
> arr > "1980-01-01 00:00:00"
> #> Error: NotImplemented: Function greater has no kernel matching input types 
> (array[timestamp[us]], scalar[string])
> # or a more real-world example
> library(dplyr)
> #> 
> #> Attaching package: 'dplyr'
> #> The following objects are masked from 'package:stats':
> #> 
> #>     filter, lag
> #> The following objects are masked from 'package:base':
> #> 
> #>     intersect, setdiff, setequal, union
> mtcars$date <- as.Date(c("1974-04-06", "1988-05-09"))
> ds <- InMemoryDataset$create(mtcars)
> ds %>%
>   filter(date > "1980-01-01") %>%
>   collect()
> #> Error: NotImplemented: Function greater has no kernel matching input types 
> (array[date32[day]], scalar[string])
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to