emkornfield commented on issue #39676:
URL: https://github.com/apache/arrow/issues/39676#issuecomment-1897645893

   Thousands for rowgroups is an anti-pattern for laying out data (I understand 
some customers do it) but it creates exactly this type of performance 
bottleneck (sometimes this is out of our control though) but we should audit 
write config parameters to make sure there isn't something that is causing this 
type of spilling, and yes in general, parquet is not well suited to very large 
column widths.  I think there is a better solution here but given that this 
touches metadata serialization I'm not sure the appetite in there will be for 
trying to incorporate metadata that parses faster.  In any case format changes 
they need to be discussed on the parquet mailing list [email protected]


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to