[GitHub] [arrow] AlenkaF opened a new pull request, #14613: ARROW-18152: [Python] DataFrame Interchange Protocol for pyarrow Table

GitBox Wed, 09 Nov 2022 00:20:03 -0800


AlenkaF opened a new pull request, #14613:
URL: https://github.com/apache/arrow/pull/14613


   ### Produce a `__dataframe__` object
   - [ ] Implement the `DataFrame`, `Column` and `Buffers` class
   - [ ] Test `pa.Table` -> `pd.DataFrame`
   
   What should be added/corrected after the initial test:
   - [ ] Data without missing values (produce a validity buffer in case of no 
missing values)
   - [ ] Boolean values do not transfer correctly (only the first element is 
produced)
   - [ ] Variable-length strings (the test currently fails due to, what seems, 
an error in pandas implementation)
   
   ---
   
   This code should work, currently it does for integers and floats with 
missing values:
   ```python
   import pyarrow as pa
   import pandas as pd
   
   table = pa.table(
       {
           "a": [1, 2, None],  # dtype kind INT = 0
           "b": [3, 4, None],  # dtype kind INT = 0
           "c": [1.5, 2.5, None],  # dtype kind FLOAT = 2
           "d": [9, 10, None],  # dtype kind INT = 0
           # "e": [True, False, None],  # dtype kind BOOLEAN = 20
           # "f": ["a", "", "c"],  # dtype kind STRING = 21
       }
   )
   
   exchange_df = table.__dataframe__()
   exchange_df._df
   # pyarrow.Table
   # a: int64
   # b: int64
   # c: double
   # d: int64
   # ----
   # a: [[1,2,null]]
   # b: [[3,4,null]]
   # c: [[1.5,2.5,null]]
   # d: [[9,10,null]]
   
   from pandas.core.interchange.from_dataframe import from_dataframe
   from_dataframe(exchange_df)
   #    a  b    c   d
   # 0  1  3  1.5   9
   # 1  2  4  2.5  10
   # 2  0  0  0.0   0
   ```
   
   ---
   
   ### Consume a `__dataframe__` object
   - [ ] Implement  `from_dataframe` method


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] AlenkaF opened a new pull request, #14613: ARROW-18152: [Python] DataFrame Interchange Protocol for pyarrow Table

Reply via email to