Yannaubineau commented on issue #40050:
URL: https://github.com/apache/arrow/issues/40050#issuecomment-1945653461

   Hi @amoeba, thank you for your answer. Sorry if it is a non-issue.
   
   The main problem I faced was the lack of indication of the source of the 
error, and the absence of warning prior to creating the error.
   
   Thank you for the sample code, it works as a charm !
   
   I think there is two aspects to this : 
   
   - The error output is confusing, `Is this a 'parquet' file?` doesn't feel 
right if the error is known related to a string size limit parameter. **So 
informing the user of this parameter inside the error message would definitely 
be an improvement.** 
   - But I also think that it could be very informative to trigger a warning 
when a parquet file is created (through `write_parquet` or `write_dataset`) 
**from a data.frame containing attributes**, simply because of how massive the 
size of the parquet file can get compared to the same data.frame without any 
attributes. **Users should be aware in some way that attributes in their data 
is impeding on the efficiency of the binary-data-storage.**


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to