[ 
https://issues.apache.org/jira/browse/ARROW-10213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Li updated ARROW-10213:
-----------------------------
    Description: 
I'd expect this code to give 1950-01-01 twice (i.e. a timestamp -> date cast 
extracts the date component, ignoring the time component):
{code:python}
import datetime
import pyarrow as pa
arr = pa.array([
    datetime.datetime(1950, 1, 1, 0, 0, 0),
    datetime.datetime(1950, 1, 1, 12, 0, 0),
], type=pa.timestamp("ns"))
print(arr)
print(arr.cast(pa.date32(), safe=False)) {code}
However it gives 1950-01-02 in the second case:
{noformat}
[
  1950-01-01 00:00:00.000000000,
  1950-01-01 12:00:00.000000000
]
[
  1950-01-01,
  1950-01-02
]
{noformat}
The reason is that the temporal cast simply divides, and C truncates towards 0 
(note: Python truncates towards -Infinity, so it would give the right answer in 
this case!), resulting in -7304 days instead of -7305.

Depending on the intended semantics of a temporal cast, either it should be 
fixed to extract the date component, or the rounding behavior should be noted 
and a separate kernel should be implemented for extracting the date component.

  was:
I'd expect this code to give 1950-01-01 twice (i.e. a timestamp -> date cast 
extracts the date component, ignoring the time component):
{code:python}
import datetime
import pyarrow as pa
arr = pa.array([
    datetime.datetime(1950, 1, 1, 0, 0, 0),
    datetime.datetime(1950, 1, 1, 12, 0, 0),
], type=pa.timestamp("ns"))print(arr)
print(arr.cast(pa.date32(), safe=False)) {code}
However it gives 1950-01-02 in the second case:
{noformat}
[
  1950-01-01 00:00:00.000000000,
  1950-01-01 12:00:00.000000000
]
[
  1950-01-01,
  1950-01-02
]
{noformat}
The reason is that the temporal cast simply divides, and C truncates towards 0 
(note: Python truncates towards -Infinity, so it would give the right answer in 
this case!), resulting in -7304 days instead of -7305.

Depending on the intended semantics of a temporal cast, either it should be 
fixed to extract the date component, or the rounding behavior should be noted 
and a separate kernel should be implemented for extracting the date component.


> [C++] Temporal cast from timestamp to date rounds instead of extracting date 
> component
> --------------------------------------------------------------------------------------
>
>                 Key: ARROW-10213
>                 URL: https://issues.apache.org/jira/browse/ARROW-10213
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>    Affects Versions: 1.0.1
>            Reporter: David Li
>            Priority: Minor
>
> I'd expect this code to give 1950-01-01 twice (i.e. a timestamp -> date cast 
> extracts the date component, ignoring the time component):
> {code:python}
> import datetime
> import pyarrow as pa
> arr = pa.array([
>     datetime.datetime(1950, 1, 1, 0, 0, 0),
>     datetime.datetime(1950, 1, 1, 12, 0, 0),
> ], type=pa.timestamp("ns"))
> print(arr)
> print(arr.cast(pa.date32(), safe=False)) {code}
> However it gives 1950-01-02 in the second case:
> {noformat}
> [
>   1950-01-01 00:00:00.000000000,
>   1950-01-01 12:00:00.000000000
> ]
> [
>   1950-01-01,
>   1950-01-02
> ]
> {noformat}
> The reason is that the temporal cast simply divides, and C truncates towards 
> 0 (note: Python truncates towards -Infinity, so it would give the right 
> answer in this case!), resulting in -7304 days instead of -7305.
> Depending on the intended semantics of a temporal cast, either it should be 
> fixed to extract the date component, or the rounding behavior should be noted 
> and a separate kernel should be implemented for extracting the date component.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to