WangGuangxin commented on PR #5750: URL: https://github.com/apache/incubator-gluten/pull/5750#issuecomment-2128446464
> > it can naturally support choose build side by size, right? > > @WangGuangxin A deeper issue might be the size estimation could be inaccurate if there is an aggregate or filter, e.g. in some case, the aggregated data has much less rows than its input and becomes the smaller table, but Spark still treats it as the larger table. @rui-mo AQE will also go through `JoinSelection` to reoptimize the new submit stage. At this time, the `stats` is not estimated, but `ShuffleQueryStage`'s real datasize. <img width="721" alt="image" src="https://github.com/apache/incubator-gluten/assets/1312321/703c50ce-dc32-4258-ba89-d25238125ef5"> If there is no `Exchange` before `Join`, yes the stats is estimated and we cann't have an accurate stats. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
