[ 
https://issues.apache.org/jira/browse/HIVE-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654321#action_12654321
 ] 

Joydeep Sen Sarma commented on HIVE-131:
----------------------------------------

i quickly glanced at the doc mentioned. seems like hadoop will move the files 
automatically to mapred.output.dir - but we don't have a single output 
directory for the task (since we can have multiple outputs).

anyway - i am not sure why the problem happens (it almost seems like map-reduce 
can declare a job complete while (speculative) tasks are still running) - but a 
trivial fix is to just create the tmp files in a completely different directory 
(say scratch dir per query) and then move from there. we can discard the 
scratch dir entirely on query completion. there's still some risk of these 
runaway files leaking inodes/space. but if these are known hive scratchdir 
locations - they can always be cleaned up later on.

> insert overwrite directory leaves behind uncommitted/tmp files from failed 
> tasks
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-131
>                 URL: https://issues.apache.org/jira/browse/HIVE-131
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> _tmp files are getting left behind on insert overwrite directory:
> /user/jssarma/ctst1/40422_m_000195_0.deflate  <r 3> 13285 2008-12-07 01:47  
> rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/40422_m_000196_0.deflate  <r 3> 3055  2008-12-07 01:46  
> rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000033_0 <r 3> 0 2008-12-07 01:53  rw-r--r-- 
> jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000037_1 <r 3> 0 2008-12-07 01:53  rw-r--r-- 
> jssarma supergroup
> this happened with speculative execution. the code looks good (in fact in 
> this case many speculative tasks were launched - and only a couple caused 
> problems). Almost seems like these files did not appear in the namespace 
> until after the map-reduce job finished and the movetask did a listing of the 
> output dir ..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to