[ 
https://issues.apache.org/jira/browse/ARROW-17893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yaser Alraddadi updated ARROW-17893:
------------------------------------
    Description: 
When there is a timedelta and a list of dictionary that also has timedelta as 
well, reading the upper timedelta in feather format sometimes gives wrong 
reading.

below is an example if you check the printed results sometime it reads the 
upper timedelta as {color:#00875a}0 days 03:40:23 correct{color}, and sometimes 
as {color:#de350b}153 days 01:03:20 wrong{color}

Here is the code, also it is attached as check_timedelta.py

 
{code:java}
from datetime import datetime, timedelta
import pandas as pd
import pyarrow.feather as feather
time_1 = datetime.fromisoformat("2022-04-21T10:18:12+03:00")
time_2 = datetime.fromisoformat("2022-04-21T13:58:35+03:00")
data = [
    {
        "waiting_time": timedelta(seconds=12, microseconds=1),
    },
    {
        "waiting_time": timedelta(seconds=1020),
    },
    {
        "waiting_time": timedelta(seconds=960),
    },
    {
        "waiting_time": timedelta(seconds=960),
    },
    {
        "waiting_time": timedelta(seconds=960),
    },
    {
        "waiting_time": timedelta(seconds=815, microseconds=1),
    },
]
df = pd.DataFrame(
    [
        {
            "time_1": time_1,
            "time_2": time_2,
            "data": data,
            "timedelta_1": time_2 - time_1,
            "timedelta_2": timedelta(hours=3, minutes=40, seconds=23),
        },
    ]
)

print("Correct timedelta_1: ", df["timedelta_1"].item())
print("Correct timedelta_2: ", df["timedelta_2"].item())

with open(f"records.feather.lz4", "wb") as f:
    feather.write_feather(df, f, compression="lz4")

for _ in range(10):
    with open(f"records.feather.lz4", "rb") as f:
        print("Reading timedelta_1: ", 
feather.read_feather(f)["timedelta_1"].item())
        print("Reading timedelta_2: ", 
feather.read_feather(f)["timedelta_2"].item())
{code}
 

 

Printed Results

 
{code:java}
Correct timedelta_1:  0 days 03:40:23
Correct timedelta_2:  0 days 03:40:23
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  0 days 03:40:23
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  0 days 03:40:23
Reading timedelta_1:  153 days 01:03:20
Reading timedelta_2:  153 days 01:03:20
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  0 days 03:40:23
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  0 days 03:40:23
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  153 days 01:03:20
Reading timedelta_1:  153 days 01:03:20
Reading timedelta_2:  0 days 03:40:23
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  153 days 01:03:20
Reading timedelta_1:  153 days 01:03:20
Reading timedelta_2:  153 days 01:03:20
Reading timedelta_1:  153 days 01:03:20
Reading timedelta_2:  153 days 01:03:20{code}
 

 

  was:
When there is a timedelta and a list of dictionary and that also has timedelta 
as well, reading the upper timedelta in feather format sometimes gives wrong 
reading.

below is an example if you check the printed results sometime it reads the 
upper timedelta as {color:#00875a}0 days 03:40:23 correct{color}, and sometimes 
as {color:#de350b}153 days 01:03:20 wrong{color}

Here is the code, also it is attached as check_timedelta.py

 
{code:java}
from datetime import datetime, timedelta
import pandas as pd
import pyarrow.feather as feather
time_1 = datetime.fromisoformat("2022-04-21T10:18:12+03:00")
time_2 = datetime.fromisoformat("2022-04-21T13:58:35+03:00")
data = [
    {
        "waiting_time": timedelta(seconds=12, microseconds=1),
    },
    {
        "waiting_time": timedelta(seconds=1020),
    },
    {
        "waiting_time": timedelta(seconds=960),
    },
    {
        "waiting_time": timedelta(seconds=960),
    },
    {
        "waiting_time": timedelta(seconds=960),
    },
    {
        "waiting_time": timedelta(seconds=815, microseconds=1),
    },
]
df = pd.DataFrame(
    [
        {
            "time_1": time_1,
            "time_2": time_2,
            "data": data,
            "timedelta_1": time_2 - time_1,
            "timedelta_2": timedelta(hours=3, minutes=40, seconds=23),
        },
    ]
)

print("Correct timedelta_1: ", df["timedelta_1"].item())
print("Correct timedelta_2: ", df["timedelta_2"].item())

with open(f"records.feather.lz4", "wb") as f:
    feather.write_feather(df, f, compression="lz4")

for _ in range(10):
    with open(f"records.feather.lz4", "rb") as f:
        print("Reading timedelta_1: ", 
feather.read_feather(f)["timedelta_1"].item())
        print("Reading timedelta_2: ", 
feather.read_feather(f)["timedelta_2"].item())
{code}
 

 

Printed Results

 
{code:java}
Correct timedelta_1:  0 days 03:40:23
Correct timedelta_2:  0 days 03:40:23
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  0 days 03:40:23
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  0 days 03:40:23
Reading timedelta_1:  153 days 01:03:20
Reading timedelta_2:  153 days 01:03:20
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  0 days 03:40:23
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  0 days 03:40:23
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  153 days 01:03:20
Reading timedelta_1:  153 days 01:03:20
Reading timedelta_2:  0 days 03:40:23
Reading timedelta_1:  0 days 03:40:23
Reading timedelta_2:  153 days 01:03:20
Reading timedelta_1:  153 days 01:03:20
Reading timedelta_2:  153 days 01:03:20
Reading timedelta_1:  153 days 01:03:20
Reading timedelta_2:  153 days 01:03:20{code}
 

 


> Wrong reading of timedelta
> --------------------------
>
>                 Key: ARROW-17893
>                 URL: https://issues.apache.org/jira/browse/ARROW-17893
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 8.0.0
>            Reporter: Yaser Alraddadi
>            Priority: Critical
>         Attachments: check_timedelta.py
>
>
> When there is a timedelta and a list of dictionary that also has timedelta as 
> well, reading the upper timedelta in feather format sometimes gives wrong 
> reading.
> below is an example if you check the printed results sometime it reads the 
> upper timedelta as {color:#00875a}0 days 03:40:23 correct{color}, and 
> sometimes as {color:#de350b}153 days 01:03:20 wrong{color}
> Here is the code, also it is attached as check_timedelta.py
>  
> {code:java}
> from datetime import datetime, timedelta
> import pandas as pd
> import pyarrow.feather as feather
> time_1 = datetime.fromisoformat("2022-04-21T10:18:12+03:00")
> time_2 = datetime.fromisoformat("2022-04-21T13:58:35+03:00")
> data = [
>     {
>         "waiting_time": timedelta(seconds=12, microseconds=1),
>     },
>     {
>         "waiting_time": timedelta(seconds=1020),
>     },
>     {
>         "waiting_time": timedelta(seconds=960),
>     },
>     {
>         "waiting_time": timedelta(seconds=960),
>     },
>     {
>         "waiting_time": timedelta(seconds=960),
>     },
>     {
>         "waiting_time": timedelta(seconds=815, microseconds=1),
>     },
> ]
> df = pd.DataFrame(
>     [
>         {
>             "time_1": time_1,
>             "time_2": time_2,
>             "data": data,
>             "timedelta_1": time_2 - time_1,
>             "timedelta_2": timedelta(hours=3, minutes=40, seconds=23),
>         },
>     ]
> )
> print("Correct timedelta_1: ", df["timedelta_1"].item())
> print("Correct timedelta_2: ", df["timedelta_2"].item())
> with open(f"records.feather.lz4", "wb") as f:
>     feather.write_feather(df, f, compression="lz4")
> for _ in range(10):
>     with open(f"records.feather.lz4", "rb") as f:
>         print("Reading timedelta_1: ", 
> feather.read_feather(f)["timedelta_1"].item())
>         print("Reading timedelta_2: ", 
> feather.read_feather(f)["timedelta_2"].item())
> {code}
>  
>  
> Printed Results
>  
> {code:java}
> Correct timedelta_1:  0 days 03:40:23
> Correct timedelta_2:  0 days 03:40:23
> Reading timedelta_1:  0 days 03:40:23
> Reading timedelta_2:  0 days 03:40:23
> Reading timedelta_1:  0 days 03:40:23
> Reading timedelta_2:  0 days 03:40:23
> Reading timedelta_1:  153 days 01:03:20
> Reading timedelta_2:  153 days 01:03:20
> Reading timedelta_1:  0 days 03:40:23
> Reading timedelta_2:  0 days 03:40:23
> Reading timedelta_1:  0 days 03:40:23
> Reading timedelta_2:  0 days 03:40:23
> Reading timedelta_1:  0 days 03:40:23
> Reading timedelta_2:  153 days 01:03:20
> Reading timedelta_1:  153 days 01:03:20
> Reading timedelta_2:  0 days 03:40:23
> Reading timedelta_1:  0 days 03:40:23
> Reading timedelta_2:  153 days 01:03:20
> Reading timedelta_1:  153 days 01:03:20
> Reading timedelta_2:  153 days 01:03:20
> Reading timedelta_1:  153 days 01:03:20
> Reading timedelta_2:  153 days 01:03:20{code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to