Re: [I] [Python] Flatten column of list of struct type and convert to pandas [arrow]

via GitHub Thu, 09 Nov 2023 04:47:34 -0800


jorisvandenbossche commented on issue #38643:
URL: https://github.com/apache/arrow/issues/38643#issuecomment-1803767114


   Exploding is currently something that isn't provided out of the box, see 
https://github.com/apache/arrow/issues/27923 for an issue on this topic and 
some example workarounds (using existing pyarrow compute functions to achieve 
the same effect). 
   
   Once you exploded the list over multiple rows, you can flatten the table 
with the struct type into a table with a top-level column for each struct field 
with the `flatten()` method:
   
   ```
   >>> table = pa.table({"id": [1, 1, 2], "events": [{"tm": 
pd.Timestamp("2012-01-01"), "sum": 10}] * 3})
   >>> table.to_pandas()
      id                                  events
   0   1  {'sum': 10, 'tm': 2012-01-01 00:00:00}
   1   1  {'sum': 10, 'tm': 2012-01-01 00:00:00}
   2   2  {'sum': 10, 'tm': 2012-01-01 00:00:00}
   
   >>> table.flatten().to_pandas() 
      id  events.sum  events.tm
   0   1          10 2012-01-01
   1   1          10 2012-01-01
   2   2          10 2012-01-01
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Python] Flatten column of list of struct type and convert to pandas [arrow]

Reply via email to