adriangb commented on issue #19580:
URL: https://github.com/apache/datafusion/issues/19580#issuecomment-3745059018

   The issue is that the scans are not aligned with join partitions.
   E.g. if we have a file `file1.parquet` it can have rows that belong to hash 
join partitions 1, 2 and 3, but it ends up in only one of those.
   So what would happen is when we read a row from there we have the filter 
`CASE (hash_repartition % 3) WHEN 0 <actual filter> WHEN 1 true WHEN 2 true 
END`. So the rows that hash into partition 1 or 2 don't get eliminated. 
Depending on how much latency waiting for those other build side partitions to 
finish would have added it may have been better to wait.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to