SortMergeJoin sorts its children by join key, but broadcast join does not.
I think the output ordering of broadcast join has nothing to do with join
key.

On Thu, Jun 28, 2018 at 11:28 PM Marco Gaido <marcogaid...@gmail.com> wrote:

> I think the outputOrdering would be the one of the big table (if any) and
> it wouldn't matter if this involves the join keys or not. Am I wrong?
>
> 2018-06-28 17:01 GMT+02:00 吴晓菊 <chrysan...@gmail.com>:
>
>> Thanks for the reply.
>> By looking into the SortMergeJoinExec, I think we can follow what
>> SortMergeJoin do, for some types of join, if the children is ordered on
>> join keys, we can output the ordered join keys as output ordering.
>>
>>
>> Chrysan Wu
>> 吴晓菊
>> Phone:+86 17717640807
>>
>>
>> 2018-06-28 22:53 GMT+08:00 Wenchen Fan <cloud0...@gmail.com>:
>>
>>> SortMergeJoin only reports ordering of the join keys, not the output
>>> ordering of any child.
>>>
>>> It seems reasonable to me that broadcast join should respect the output
>>> ordering of the children. Feel free to submit a PR to fix it, thanks!
>>>
>>> On Thu, Jun 28, 2018 at 10:07 PM 吴晓菊 <chrysan...@gmail.com> wrote:
>>>
>>>> Why we cannot use the output order of big table?
>>>>
>>>>
>>>> Chrysan Wu
>>>> Phone:+86 17717640807
>>>>
>>>>
>>>> 2018-06-28 21:48 GMT+08:00 Marco Gaido <marcogaid...@gmail.com>:
>>>>
>>>>> The easy answer to this is that SortMergeJoin ensure an
>>>>> outputOrdering, while BroadcastHashJoin doesn't, ie. after running a
>>>>> BroadcastHashJoin you don't know which is going to be the order of the
>>>>> output since nothing enforces it.
>>>>>
>>>>> Hope this helps.
>>>>> Thanks.
>>>>> Marco
>>>>>
>>>>> 2018-06-28 15:46 GMT+02:00 吴晓菊 <chrysan...@gmail.com>:
>>>>>
>>>>>>
>>>>>> We see SortMergeJoinExec is implemented with
>>>>>> outputPartitioning&outputOrdering while BroadcastHashJoinExec is only
>>>>>> implemented with outputPartitioning. Why is the design?
>>>>>>
>>>>>> Chrysan Wu
>>>>>> Phone:+86 17717640807
>>>>>>
>>>>>>
>>>>>
>>>>
>>
>

Reply via email to