Re: [PR] [CORE] Use the smaller table to build hashmap in shuffled hash join [incubator-gluten]

via GitHub Thu, 23 May 2024 20:25:36 -0700


WangGuangxin commented on PR #5750:
URL: 
https://github.com/apache/incubator-gluten/pull/5750#issuecomment-2128446464


   > > it can naturally support choose build side by size, right?
   > 
   > @WangGuangxin A deeper issue might be the size estimation could be 
inaccurate if there is an aggregate or filter, e.g. in some case, the 
aggregated data has much less rows than its input and becomes the smaller 
table, but Spark still treats it as the larger table.
   
   @rui-mo AQE will also go through `JoinSelection` to reoptimize the new 
submit stage. At this time, the `stats` is not estimated, but 
`ShuffleQueryStage`'s real datasize.
   <img width="721" alt="image" 
src="https://github.com/apache/incubator-gluten/assets/1312321/703c50ce-dc32-4258-ba89-d25238125ef5";>
   
   If there is no `Exchange` before `Join`, yes the stats is estimated and we 
cann't have an accurate stats.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [CORE] Use the smaller table to build hashmap in shuffled hash join [incubator-gluten]

Reply via email to