nealrichardson commented on issue #45373:
URL: https://github.com/apache/arrow/issues/45373#issuecomment-3696679818

   > Is there consequence to collapsing when not required for the symptoms 
above?
   
   The main reason you don't want to start inserting `collapse()` everywhere is 
that it prevents predicate pushdown (i.e. only reading in the columns and 
rows/files you need for the query) from anything that comes after it. You won't 
see that penalty on in-memory tables though, only when you're querying datasets 
on disk or over the network. 
   
   Rather than trying to develop a heuristic for when you should use 
`collapse()`, we should just fix the bug :) If we're doing things right, I 
don't think there's ever a reason you should need to use it--that's probably 
why you didn't know about it before.
   
   @eitsupi thanks for that pointer, that's worth considering.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to