[GitHub] [arrow] jorisvandenbossche commented on issue #15178: [Python] `Table.slice` not updating `pandas_metadata`

GitBox Wed, 18 Jan 2023 08:46:15 -0800


jorisvandenbossche commented on issue #15178:
URL: https://github.com/apache/arrow/issues/15178#issuecomment-1387374398


   The pandas metadata is a quite primitive solution initially implemented to 
ensure correct roundtrip between pandas <-> arrow/parquet. That works for exact 
roundtrips, but once you do some intermediate operations on the arrow table, 
this can easily break down (eg you could also change columns), and we currently 
don't guarantee to update those metadata through operations. 
   
   So I would tend to label this as "won't-fix". 
   
   For slice itself, it might be relatively easy to update the pandas metadata 
to follow this change. But for example for a similar operation, what when you 
filter the table with some condition? Given that there are so many potential 
ways the metadata could get out of sync, I am hesitant to special case slicing.
   
   When converting with `to_pandas`, we will check if the metadata about a 
range index still matches the length of the table, and if not just produce a 
default index for the resulting pandas.DataFrame. That is the reason that in 
your last code example the index seems to be "dropped".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on issue #15178: [Python] `Table.slice` not updating `pandas_metadata`

Reply via email to