[
https://issues.apache.org/jira/browse/HIVE-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797007#action_12797007
]
Ning Zhang commented on HIVE-988:
---------------------------------
The job tracker page shows the mapper was completed but the job was killed.
Hadoop job_200912262300_60175 on silver
User: nzhang
Job Name: select /*+ mapjoin(b) */ * fro...a.key=b.key(Stage-1)
Job File:
hdfs://dfstmp.data.facebook.com:9000/tmp/mapred/SILVER/system/job_200912262300_60175/job.xml
Job Setup: Successful
Status: Killed
Started at: Tue Jan 05 16:51:57 PST 2010
Killed at: Tue Jan 05 17:04:12 PST 2010
Killed in: 12mins, 15sec
Job Cleanup: Successful
Kind % Complete Num Tasks Pending Running Complete Killed
Failed/Killed
Task Attempts
map 100.00%
1 0 0 1 0 0 / 0
reduce 100.00%
0 0 0 0 0 0 / 0
> mapjoin should throw an error if the input is too large
> -------------------------------------------------------
>
> Key: HIVE-988
> URL: https://issues.apache.org/jira/browse/HIVE-988
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: Ning Zhang
> Fix For: 0.5.0
>
> Attachments: HIVE-988.patch, HIVE-988_2.patch
>
>
> If the input to the map join is larger than a specific threshold, it may lead
> to a very slow execution of the join.
> It is better to throw an error, and let the user redo his query as a non
> map-join query.
> However, the current map-reduce framework will retry the mapper 4 times
> before actually killing the job.
> Based on a offline discussion with Dhruba, Ning and myself, we came up with
> the following algorithm:
> Keep a threshold in the mapper for the number of rows to be processed for
> map-join. If the number of rows
> exceeds that threshold, set a counter and kill that mapper.
> The client (ExecDriver) monitors that job continuously - if this counter is
> set, it kills the job and also
> shows an appropriate error message to the user, so that he can retry the
> query without the map join.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.