Yaser Alraddadi created ARROW-17893:
---------------------------------------
Summary: Wrong reading of timedelta
Key: ARROW-17893
URL: https://issues.apache.org/jira/browse/ARROW-17893
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 8.0.0
Reporter: Yaser Alraddadi
Attachments: check_timedelta.py
When there is a timedelta and a list of dictionary and that also has timedelta
as well, reading the upper timedelta in feather format sometimes gives wrong
reading.
below is an example if you check the printed results sometime it reads the
upper timedelta as {color:#00875a}0 days 03:40:23 correct{color}, and sometimes
as {color:#de350b}153 days 01:03:20 wrong{color}
Here is the code, also it is attached as check_timedelta.py
{code:java}
from datetime import datetime, timedelta
import pandas as pd
import pyarrow.feather as feather
time_1 = datetime.fromisoformat("2022-04-21T10:18:12+03:00")
time_2 = datetime.fromisoformat("2022-04-21T13:58:35+03:00")
data = [
{
"waiting_time": timedelta(seconds=12, microseconds=1),
},
{
"waiting_time": timedelta(seconds=1020),
},
{
"waiting_time": timedelta(seconds=960),
},
{
"waiting_time": timedelta(seconds=960),
},
{
"waiting_time": timedelta(seconds=960),
},
{
"waiting_time": timedelta(seconds=815, microseconds=1),
},
]
df = pd.DataFrame(
[
{
"time_1": time_1,
"time_2": time_2,
"data": data,
"timedelta_1": time_2 - time_1,
"timedelta_2": timedelta(hours=3, minutes=40, seconds=23),
},
]
)
print("Correct timedelta_1: ", df["timedelta_1"].item())
print("Correct timedelta_2: ", df["timedelta_2"].item())
with open(f"records.feather.lz4", "wb") as f:
feather.write_feather(df, f, compression="lz4")
for _ in range(10):
with open(f"records.feather.lz4", "rb") as f:
print("Reading timedelta_1: ",
feather.read_feather(f)["timedelta_1"].item())
print("Reading timedelta_2: ",
feather.read_feather(f)["timedelta_2"].item())
{code}
Printed Results
{code:java}
Correct timedelta_1: 0 days 03:40:23
Correct timedelta_2: 0 days 03:40:23
Reading timedelta_1: 0 days 03:40:23
Reading timedelta_2: 0 days 03:40:23
Reading timedelta_1: 0 days 03:40:23
Reading timedelta_2: 0 days 03:40:23
Reading timedelta_1: 153 days 01:03:20
Reading timedelta_2: 153 days 01:03:20
Reading timedelta_1: 0 days 03:40:23
Reading timedelta_2: 0 days 03:40:23
Reading timedelta_1: 0 days 03:40:23
Reading timedelta_2: 0 days 03:40:23
Reading timedelta_1: 0 days 03:40:23
Reading timedelta_2: 153 days 01:03:20
Reading timedelta_1: 153 days 01:03:20
Reading timedelta_2: 0 days 03:40:23
Reading timedelta_1: 0 days 03:40:23
Reading timedelta_2: 153 days 01:03:20
Reading timedelta_1: 153 days 01:03:20
Reading timedelta_2: 153 days 01:03:20
Reading timedelta_1: 153 days 01:03:20
Reading timedelta_2: 153 days 01:03:20{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)