gene-bordegaray opened a new issue, #19131:
URL: https://github.com/apache/datafusion/issues/19131

   ### Describe the bug
   
   When a parent operator requires a hash partitioning in `EnforceDistribution` 
rule, there exists a conition that theoretically should add round-robin 
partitioning. The conditions for this are:
   
   If the parent operator requires hash partitioning
   - If hash repartitioning is not necessary (we are not already correctly 
hashed) and we want to add a round-robin => add a round-robin repartition on top
   - If hash is necessary (we do not meet the hash requirements) => add a hash 
repartition on top
   
   [link to 
code](https://github.com/apache/datafusion/blob/6746007826ebd3fcb5614bf87183674435bbb134/datafusion/physical-optimizer/src/enforce_distribution.rs#L1279)
   
   This logic is incorrect as if the parent requires hash repartitioning and we 
are already hash partitioned, we should never round-robin repartition as this 
would break our hash partitioning.
   
   This logic is proved to be incorrect as Datafusion actually never evaluates 
the first condition as true.
   
   ### To Reproduce
   
   Run all sqllogictests and benchmarks with this condition. 
   
   Delete it, and run them all again. No plans will change (it's dead code)
   
   ### Expected behavior
   
   The correct logic should be
   
   If the parent operator requires hash partitioning
   - If we are already hashed correctly, don't do anything. If we insert a 
round-robin we will break out partitioning
   - If we a hash is required, just insert a hash repartition
   
   ### Additional context
   
   I actually created this dead code in issue #18341 to avoid consecutive 
repartitions, but glanced over the fact that this round-robin should never be 
happening (and the change makes it so that it doesn't) in the first place


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to