[I] Flatten column of list of struct type and convert to pandas [arrow]

via GitHub Thu, 09 Nov 2023 00:47:54 -0800


sergun opened a new issue, #38643:
URL: https://github.com/apache/arrow/issues/38643


   ### Describe the usage question you have. Please include as many useful 
details as  possible.
   
   
   I have pa.Table with neseted column events:
   ```
   id           int64
   events       list<item: struct<tm: timestamp[s], sum: int64>>
   ```
   It is easy to convert it to pandas with pa.Table.to_pandas() method but it 
creates pd.DataFrame with column events of object type:
   ```
   id           int64
   events       object
   ```
   And further flattening of the data in pandas is inefficient. 
   
   How can I efficiently convert the table in PyArrow to flattened pd.DataFrame 
with columns id, tm, sum?
   
   It is possible e.g. in Spark powered by Arrow:
   ```
   df.select("id", explode("events")).select("id", "col.*")
   ```
   And I hope it should be also possible in PyArrow only.
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Flatten column of list of struct type and convert to pandas [arrow]

Reply via email to