AlenkaF opened a new pull request, #14613:
URL: https://github.com/apache/arrow/pull/14613
### Produce a `__dataframe__` object
- [ ] Implement the `DataFrame`, `Column` and `Buffers` class
- [ ] Test `pa.Table` -> `pd.DataFrame`
What should be added/corrected after the initial test:
- [ ] Data without missing values (produce a validity buffer in case of no
missing values)
- [ ] Boolean values do not transfer correctly (only the first element is
produced)
- [ ] Variable-length strings (the test currently fails due to, what seems,
an error in pandas implementation)
---
This code should work, currently it does for integers and floats with
missing values:
```python
import pyarrow as pa
import pandas as pd
table = pa.table(
{
"a": [1, 2, None], # dtype kind INT = 0
"b": [3, 4, None], # dtype kind INT = 0
"c": [1.5, 2.5, None], # dtype kind FLOAT = 2
"d": [9, 10, None], # dtype kind INT = 0
# "e": [True, False, None], # dtype kind BOOLEAN = 20
# "f": ["a", "", "c"], # dtype kind STRING = 21
}
)
exchange_df = table.__dataframe__()
exchange_df._df
# pyarrow.Table
# a: int64
# b: int64
# c: double
# d: int64
# ----
# a: [[1,2,null]]
# b: [[3,4,null]]
# c: [[1.5,2.5,null]]
# d: [[9,10,null]]
from pandas.core.interchange.from_dataframe import from_dataframe
from_dataframe(exchange_df)
# a b c d
# 0 1 3 1.5 9
# 1 2 4 2.5 10
# 2 0 0 0.0 0
```
---
### Consume a `__dataframe__` object
- [ ] Implement `from_dataframe` method
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]