jorisvandenbossche commented on issue #39539: URL: https://github.com/apache/arrow/issues/39539#issuecomment-2017751525
I would still prefer someone to first do a PR to the spec to add this. If it is just clarifying that the existing `DATETIME` dtype kind can also be used for other Arrow date and time dtypes, that should relatively easy. > I see that libraries are working around this by defining date and time types as protocol DATETIME data type with Apache Arrow C Data Interface format string (example `tdD` for `date32`, `tdm` for `date64` etc, see [Polars code](https://github.com/pola-rs/polars/blob/53f55367d1428b6d4ab51a7b17a8dbf4c003ac43/py-polars/polars/interchange/utils.py#L48-L50) and [pandas code](https://github.com/pandas-dev/pandas/blob/4f145b3a04ac2e9167545a8a2a09d30856d9ce42/pandas/core/interchange/utils.py#L84-L92)). AFAIK pandas doesn't actually support this for duration, at least not for the default timedelta dtype (from testing with pandas main): ``` In [7]: from pyarrow.interchange import from_dataframe In [8]: from_dataframe(pd.DataFrame({'a': pd.timedelta_range(0, "1 days", freq='s')})) ... File ~/scipy/repos/pandas/pandas/core/interchange/utils.py:147, in dtype_to_arrow_c_fmt(dtype) 144 elif isinstance(dtype, DatetimeTZDtype): 145 return ArrowCTypes.TIMESTAMP.format(resolution=dtype.unit[0], tz=dtype.tz) --> 147 raise NotImplementedError( 148 f"Conversion of {dtype} to Arrow C format string is not implemented." 149 ) NotImplementedError: Conversion of timedelta64[ns] to Arrow C format string is not implemented. ``` FWIW, my proposal to add support for the Arrow PyCapsule protocol to the interchange standard (https://github.com/data-apis/dataframe-api/pull/342) would also solve this for the case of polars and pyarrow, as both are Arrow-memory based, and could interchange easily those data types. (although that of course requires polars to implement it, and based on https://github.com/pola-rs/polars/issues/12530 that is still WIP I think) We _could_ start checking for that protocol in `pyarrow.interchange.from_dataframe`, although that would also be an extension not covered by the official spec. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
