[
https://issues.apache.org/jira/browse/HIVE-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971177#action_12971177
]
Namit Jain commented on HIVE-1695:
----------------------------------
Sorry for the delay on responding on this.
@Sreekanth, after https://issues.apache.org/jira/browse/HIVE-1642, we are
planning to slowly deprecate/ignore the MAPJOIN
hint, and do all the optimizations at runtime.
A join followed by group by today will be run as 2 MR jobs, and 1 map-only job
followed by 1 MR job if
HIVE-1642 decides to convert the join into a regular join.
Your approach is certainly more optimal.
What is your use case ? Are you concerned about the join followed by groupby
where the join key is the same as groupby key ?
Or, are you concerned about a a join followed by any operator which leads to a
reduce-sink ?
As Joy said above, it is very important to carefully tune the memory for the
map-join, because the code assumes that there
is no memory consuming operations going on. The only exception to this rule so
far was HIVE-1830.
We should not do any optimizations for map-join, but for general joins which
may be converted to joins at runtime.
> MapJoin followed by ReduceSink should be done as single MapReduce Job
> ---------------------------------------------------------------------
>
> Key: HIVE-1695
> URL: https://issues.apache.org/jira/browse/HIVE-1695
> Project: Hive
> Issue Type: Improvement
> Components: Query Processor
> Reporter: Amareshwari Sriramadasu
> Assignee: Sreekanth Ramakrishnan
> Attachments: hive-1695-1.patch, hive-1695.patch
>
>
> Currently MapJoin followed by ReduceSink runs as two MapReduce jobs : One map
> only job followed by a Map-Reduce job. It can be combined into single
> MapReduce Job.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.