gabotechs commented on PR #18014:
URL: https://github.com/apache/datafusion/pull/18014#issuecomment-3418687673

   Not really related to this PR specifically, but here are my two cents about 
some things that could be happening:
   
   > I ran into this using datafusion-distributed which I think makes the issue 
of partition execution time skew even more likely to happen
   
   Not completely sure, but I'd say it should not be making things much worst. 
One thing that `datafusion-distributed` does is artificially scaling up the 
output partitions of `RepartitionExec` nodes, as it's a core piece for 
performing network shuffles. This means that you can end up in situations with 
a `RepartitionExec` with just 8 input partitions but 1000+ output partitions. 
Not sure how that affects this problem, but leaving the info here in case 
someone finds it relevant.
   
   > So what I did was at least make queries that would have previously fail 
continue forward with disk spilling
   
   Note that currently DataFusion is capable of severely over-accounting memory 
under certain situations (more info in 
https://github.com/apache/datafusion/issues/16841). So even if this PR has 
value itself, It'd be worth to double check you are not hitting one of those 
situations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to