[ 
https://issues.apache.org/jira/browse/HIVE-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844858#comment-13844858
 ] 

Remus Rusanu commented on HIVE-5595:
------------------------------------

This implementation is very similar to the vectorized MAP JOIN: it iterates 
over the input batch and calls super.processOp row-by-row. This has the 
advantage of working identically with the existing row-mode SMB join. the 
implementation requires only the big table to be vectorized, the small table(s) 
are not required to expose the vectorized interface. The way SMB join works is 
that it drives the processing on the small tables itself, from the processOp of 
the big table, and the way it drives it is entirely row-mode. Unfortunately, 
even if the small tables do expose vectorized execution, it is not used. That 
portion of the plan (FetchOperator->DummySinkOperator) is completely ignored 
during the vectorization. Going forward it would be desirable to provide a more 
complete vectorized execution plan for SMB plans, given that the 'small' 
table(s) may be (often are) small only in name (ie. not the 'BigTableAlias' in 
the SMBJoinDesc).
the implementation of VSMB and VMAPJOIN have a lot in common and much of the 
code repeats. I would like to refactor the code to be more DRY, but I would do 
that as a separate JIRA/patch avoid impact on the existing VMAPJOIN now.

> Implement vectorized SMB JOIN
> -----------------------------
>
>                 Key: HIVE-5595
>                 URL: https://issues.apache.org/jira/browse/HIVE-5595
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Remus Rusanu
>            Assignee: Remus Rusanu
>            Priority: Critical
>         Attachments: HIVE-5595.1.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to