OfekShilon commented on PR #15252:
URL: https://github.com/apache/arrow/pull/15252#issuecomment-1375934621

    
   > row names are removed from metadata for a few reasons:
   > 
   > * They are length `n`, so they can easily get too large to store in schema 
metadata
   > * If rows are filtered in arrow prior to converting to an R data.frame, 
the row names won't be filtered (they're not in the table), and their length 
won't match the resulting data.
   > 
   > If you want to keep row names, they need to be added to the table. We 
could try to develop a convention for how to do that automatically in the R 
package, though the best solution is probably for you to be explicit about 
adding the column to the table yourself.
   
   Thanks for your prompt reply. 
   
   As users we obviously expect the save/load roundtrip to be a no-op.  Not 
sure silently dropping row.names (or other attributes)  just because they were 
less convenient implementation-wise is the best design choice.
   
   I can suggest quite a few ways to avoid this unpleasant surprise:
   (1) Serializing row.names as regular column data (with some designated name) 
and deserializing them directly into an attribute, (best for the user)
   (2) Adding a `save_row_names` parameter to write_feather, even defaulting to 
false - _but have it documented_,
   (3) A warning like "You're trying to save a large object with row names. 
Consider putting them at a column for better performance and stability",   
(probably best for the implementer)
   etc. etc.
   Hopefully something like this can be done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to