Re: [I] [Python] Flatten column of list of struct type and convert to pandas [arrow]

via GitHub Mon, 13 Nov 2023 11:35:09 -0800


sergun commented on issue #38643:
URL: https://github.com/apache/arrow/issues/38643#issuecomment-1808896791


   > Exploding is currently something that isn't provided out of the box, see 
#27923 for an issue on this topic and some example workarounds (using existing 
pyarrow compute functions to achieve the same effect).
   > 
   > Once you exploded the list over multiple rows, you can flatten the table 
with the struct type into a table with a top-level column for each struct field 
with the `flatten()` method:
   > 
   > ```
   > >>> table = pa.table({"id": [1, 1, 2], "events": [{"tm": 
pd.Timestamp("2012-01-01"), "sum": 10}] * 3})
   > >>> table.to_pandas()
   >    id                                  events
   > 0   1  {'sum': 10, 'tm': 2012-01-01 00:00:00}
   > 1   1  {'sum': 10, 'tm': 2012-01-01 00:00:00}
   > 2   2  {'sum': 10, 'tm': 2012-01-01 00:00:00}
   > 
   > >>> table.flatten().to_pandas() 
   >    id  events.sum  events.tm
   > 0   1          10 2012-01-01
   > 1   1          10 2012-01-01
   > 2   2          10 2012-01-01
   > ```
   
   Thx a lot!
   #27923 + pa.table.flatten() solves the issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Python] Flatten column of list of struct type and convert to pandas [arrow]

Reply via email to