SortMergeJoin sorts its children by join key, but broadcast join does not. I think the output ordering of broadcast join has nothing to do with join key.
On Thu, Jun 28, 2018 at 11:28 PM Marco Gaido <marcogaid...@gmail.com> wrote: > I think the outputOrdering would be the one of the big table (if any) and > it wouldn't matter if this involves the join keys or not. Am I wrong? > > 2018-06-28 17:01 GMT+02:00 吴晓菊 <chrysan...@gmail.com>: > >> Thanks for the reply. >> By looking into the SortMergeJoinExec, I think we can follow what >> SortMergeJoin do, for some types of join, if the children is ordered on >> join keys, we can output the ordered join keys as output ordering. >> >> >> Chrysan Wu >> 吴晓菊 >> Phone:+86 17717640807 >> >> >> 2018-06-28 22:53 GMT+08:00 Wenchen Fan <cloud0...@gmail.com>: >> >>> SortMergeJoin only reports ordering of the join keys, not the output >>> ordering of any child. >>> >>> It seems reasonable to me that broadcast join should respect the output >>> ordering of the children. Feel free to submit a PR to fix it, thanks! >>> >>> On Thu, Jun 28, 2018 at 10:07 PM 吴晓菊 <chrysan...@gmail.com> wrote: >>> >>>> Why we cannot use the output order of big table? >>>> >>>> >>>> Chrysan Wu >>>> Phone:+86 17717640807 >>>> >>>> >>>> 2018-06-28 21:48 GMT+08:00 Marco Gaido <marcogaid...@gmail.com>: >>>> >>>>> The easy answer to this is that SortMergeJoin ensure an >>>>> outputOrdering, while BroadcastHashJoin doesn't, ie. after running a >>>>> BroadcastHashJoin you don't know which is going to be the order of the >>>>> output since nothing enforces it. >>>>> >>>>> Hope this helps. >>>>> Thanks. >>>>> Marco >>>>> >>>>> 2018-06-28 15:46 GMT+02:00 吴晓菊 <chrysan...@gmail.com>: >>>>> >>>>>> >>>>>> We see SortMergeJoinExec is implemented with >>>>>> outputPartitioning&outputOrdering while BroadcastHashJoinExec is only >>>>>> implemented with outputPartitioning. Why is the design? >>>>>> >>>>>> Chrysan Wu >>>>>> Phone:+86 17717640807 >>>>>> >>>>>> >>>>> >>>> >> >