[
https://issues.apache.org/jira/browse/HIVE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966247#action_12966247
]
Joydeep Sen Sarma commented on HIVE-1695:
-----------------------------------------
couple of things to watch out for:
- mapjoin uses a lot of memory on the mapper. i am not sure how the memory
setting are controlled - but we need to make sure that the map-join and the
sort (imposed by the reducesink) don't blow through the task heap limits. In
case the RS is coming because of group by - the map side hash aggregation will
also use memory.
- the stuff that liyin has been working on converts regular joins into map
joins automatically. i believe he generates several plans (map-join and
sort-merge join) and chooses from one of them at runtime. will the technique
discussed here apply to map-join plans generated by auto-map-joins? (i am not
sure - so asking)
> MapJoin followed by ReduceSink should be done as single MapReduce Job
> ---------------------------------------------------------------------
>
> Key: HIVE-1695
> URL: https://issues.apache.org/jira/browse/HIVE-1695
> Project: Hive
> Issue Type: Improvement
> Components: Query Processor
> Reporter: Amareshwari Sriramadasu
>
> Currently MapJoin followed by ReduceSink runs as two MapReduce jobs : One map
> only job followed by a Map-Reduce job. It can be combined into single
> MapReduce Job.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.