jamesdow21 commented on issue #37192:
URL: https://github.com/apache/arrow/issues/37192#issuecomment-1680856110
My current use case was particularly focused on replicating the
functionality of pandas's timedelta64 dtype `Series.dt.total_seconds()` method,
since the Duration arrow backed dtype doesn't currently support the `.dt`
accessor
As a rough example, I started with a function like this
```
def duration_total_seconds(s: pd.Series) -> pd.Series:
type_err_msg = f"Only supported for duration dtypes, not {s.dtype}"
if pd.api.types.is_timedelta64_dtype(s):
return s.dt.total_seconds()
elif isinstance(s.dtype, pd.ArrowDtype):
pa_type = s.dtype.pyarrow_dtype
if not isinstance(pa_type, pa.DurationType):
raise TypeError(type_err_msg)
return s / timedelta(seconds=1)
else:
raise TypeError(type_err_msg)
```
but this gives the error
`ArrowNotImplementedError: Function 'divide' has no kernel matching input
types (duration[us], duration[us])`
My current workaround is essentially just casting both of those timedeltas
to equivalent integers and dividing those
```
def duration_total_seconds(s: pd.Series) -> pd.Series:
type_err_msg = f"Only supported for duration dtypes, not {s.dtype}"
if pd.api.types.is_timedelta64_dtype(s):
return s.dt.total_seconds()
elif isinstance(s.dtype, pd.ArrowDtype):
pa_type = s.dtype.pyarrow_dtype
if not isinstance(pa_type, pa.DurationType):
raise TypeError(type_err_msg)
integer_seconds = {
"s": 1,
"ms": 1_000,
"us": 1_000_000,
"ns": 1_000_000_000,
}[pa_type.unit]
return s.astype("int64[pyarrow]") / integer_seconds
else:
raise TypeError(type_err_msg)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]