Sorry for the mistake. You are right output ordering of broadcast join can be the order of big table in some types of join. I will prepare a PR and let you review later. Thanks a lot!
Chrysan Wu 吴晓菊 Phone:+86 17717640807 2018-06-29 0:00 GMT+08:00 Wenchen Fan <cloud0...@gmail.com>: > SortMergeJoin sorts its children by join key, but broadcast join does not. > I think the output ordering of broadcast join has nothing to do with join > key. > > On Thu, Jun 28, 2018 at 11:28 PM Marco Gaido <marcogaid...@gmail.com> > wrote: > >> I think the outputOrdering would be the one of the big table (if any) and >> it wouldn't matter if this involves the join keys or not. Am I wrong? >> >> 2018-06-28 17:01 GMT+02:00 吴晓菊 <chrysan...@gmail.com>: >> >>> Thanks for the reply. >>> By looking into the SortMergeJoinExec, I think we can follow what >>> SortMergeJoin do, for some types of join, if the children is ordered on >>> join keys, we can output the ordered join keys as output ordering. >>> >>> >>> Chrysan Wu >>> 吴晓菊 >>> Phone:+86 17717640807 >>> >>> >>> 2018-06-28 22:53 GMT+08:00 Wenchen Fan <cloud0...@gmail.com>: >>> >>>> SortMergeJoin only reports ordering of the join keys, not the output >>>> ordering of any child. >>>> >>>> It seems reasonable to me that broadcast join should respect the output >>>> ordering of the children. Feel free to submit a PR to fix it, thanks! >>>> >>>> On Thu, Jun 28, 2018 at 10:07 PM 吴晓菊 <chrysan...@gmail.com> wrote: >>>> >>>>> Why we cannot use the output order of big table? >>>>> >>>>> >>>>> Chrysan Wu >>>>> Phone:+86 17717640807 >>>>> >>>>> >>>>> 2018-06-28 21:48 GMT+08:00 Marco Gaido <marcogaid...@gmail.com>: >>>>> >>>>>> The easy answer to this is that SortMergeJoin ensure an >>>>>> outputOrdering, while BroadcastHashJoin doesn't, ie. after running a >>>>>> BroadcastHashJoin you don't know which is going to be the order of the >>>>>> output since nothing enforces it. >>>>>> >>>>>> Hope this helps. >>>>>> Thanks. >>>>>> Marco >>>>>> >>>>>> 2018-06-28 15:46 GMT+02:00 吴晓菊 <chrysan...@gmail.com>: >>>>>> >>>>>>> >>>>>>> We see SortMergeJoinExec is implemented with >>>>>>> outputPartitioning&outputOrdering >>>>>>> while BroadcastHashJoinExec is only implemented with outputPartitioning. >>>>>>> Why is the design? >>>>>>> >>>>>>> Chrysan Wu >>>>>>> Phone:+86 17717640807 >>>>>>> >>>>>>> >>>>>> >>>>> >>> >>