AlenkaF commented on issue #19770:
URL: https://github.com/apache/arrow/issues/19770#issuecomment-1481054673
I think this can be closed as the datetime object can now be preserved with
the use of `timestamp_as_object=True`:
```python
import pyarrow as pa
pa.__version__
# '12.0.0.dev279+gb20734438'
import pandas as pd
from datetime import datetime
datetime_data = [
[datetime(2015, 1, 5, 12, 0, 0), datetime(2020, 8, 22, 10, 5, 0)],
[datetime(2024, 5, 5, 5, 49, 1), datetime(2015, 12, 24, 22, 10, 17)],
[datetime(1996, 4, 30, 2, 38, 11)],
None,
[datetime(1987, 1, 27, 8, 21, 59)]
]
df = pd.DataFrame({'a': datetime_data})
table = pa.table(df)
table.to_pandas(timestamp_as_object=True).values
# array([[array([datetime.datetime(2015, 1, 5, 12, 0),
# datetime.datetime(2020, 8, 22, 10, 5)], dtype=object)],
# [array([datetime.datetime(2024, 5, 5, 5, 49, 1),
# datetime.datetime(2015, 12, 24, 22, 10, 17)],
dtype=object)],
# [array([datetime.datetime(1996, 4, 30, 2, 38, 11)], dtype=object)],
# [None],
# [array([datetime.datetime(1987, 1, 27, 8, 21, 59)], dtype=object)]],
# dtype=object)
```
There is still an issue where the list roundtrips to a numpy array of numpy
arrays, but there are other issues tracking this
(https://github.com/apache/arrow/issues/34574,
https://github.com/apache/arrow/issues/20222) - we could think of supporting an
option to preserve list dtype also. But this should come after the optimisation
of `to_pylist` (https://github.com/apache/arrow/issues/28694)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]