[ https://issues.apache.org/jira/browse/HIVE-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844858#comment-13844858 ]
Remus Rusanu commented on HIVE-5595: ------------------------------------ This implementation is very similar to the vectorized MAP JOIN: it iterates over the input batch and calls super.processOp row-by-row. This has the advantage of working identically with the existing row-mode SMB join. the implementation requires only the big table to be vectorized, the small table(s) are not required to expose the vectorized interface. The way SMB join works is that it drives the processing on the small tables itself, from the processOp of the big table, and the way it drives it is entirely row-mode. Unfortunately, even if the small tables do expose vectorized execution, it is not used. That portion of the plan (FetchOperator->DummySinkOperator) is completely ignored during the vectorization. Going forward it would be desirable to provide a more complete vectorized execution plan for SMB plans, given that the 'small' table(s) may be (often are) small only in name (ie. not the 'BigTableAlias' in the SMBJoinDesc). the implementation of VSMB and VMAPJOIN have a lot in common and much of the code repeats. I would like to refactor the code to be more DRY, but I would do that as a separate JIRA/patch avoid impact on the existing VMAPJOIN now. > Implement vectorized SMB JOIN > ----------------------------- > > Key: HIVE-5595 > URL: https://issues.apache.org/jira/browse/HIVE-5595 > Project: Hive > Issue Type: Sub-task > Reporter: Remus Rusanu > Assignee: Remus Rusanu > Priority: Critical > Attachments: HIVE-5595.1.patch > > Original Estimate: 168h > Remaining Estimate: 168h > -- This message was sent by Atlassian JIRA (v6.1.4#6159)