[ https://issues.apache.org/jira/browse/HIVE-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797007#action_12797007 ]
Ning Zhang commented on HIVE-988: --------------------------------- The job tracker page shows the mapper was completed but the job was killed. Hadoop job_200912262300_60175 on silver User: nzhang Job Name: select /*+ mapjoin(b) */ * fro...a.key=b.key(Stage-1) Job File: hdfs://dfstmp.data.facebook.com:9000/tmp/mapred/SILVER/system/job_200912262300_60175/job.xml Job Setup: Successful Status: Killed Started at: Tue Jan 05 16:51:57 PST 2010 Killed at: Tue Jan 05 17:04:12 PST 2010 Killed in: 12mins, 15sec Job Cleanup: Successful Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 100.00% 1 0 0 1 0 0 / 0 reduce 100.00% 0 0 0 0 0 0 / 0 > mapjoin should throw an error if the input is too large > ------------------------------------------------------- > > Key: HIVE-988 > URL: https://issues.apache.org/jira/browse/HIVE-988 > Project: Hadoop Hive > Issue Type: New Feature > Components: Query Processor > Reporter: Namit Jain > Assignee: Ning Zhang > Fix For: 0.5.0 > > Attachments: HIVE-988.patch, HIVE-988_2.patch > > > If the input to the map join is larger than a specific threshold, it may lead > to a very slow execution of the join. > It is better to throw an error, and let the user redo his query as a non > map-join query. > However, the current map-reduce framework will retry the mapper 4 times > before actually killing the job. > Based on a offline discussion with Dhruba, Ning and myself, we came up with > the following algorithm: > Keep a threshold in the mapper for the number of rows to be processed for > map-join. If the number of rows > exceeds that threshold, set a counter and kill that mapper. > The client (ExecDriver) monitors that job continuously - if this counter is > set, it kills the job and also > shows an appropriate error message to the user, so that he can retry the > query without the map join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.