[ https://issues.apache.org/jira/browse/HIVE-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654321#action_12654321 ]
Joydeep Sen Sarma commented on HIVE-131: ---------------------------------------- i quickly glanced at the doc mentioned. seems like hadoop will move the files automatically to mapred.output.dir - but we don't have a single output directory for the task (since we can have multiple outputs). anyway - i am not sure why the problem happens (it almost seems like map-reduce can declare a job complete while (speculative) tasks are still running) - but a trivial fix is to just create the tmp files in a completely different directory (say scratch dir per query) and then move from there. we can discard the scratch dir entirely on query completion. there's still some risk of these runaway files leaking inodes/space. but if these are known hive scratchdir locations - they can always be cleaned up later on. > insert overwrite directory leaves behind uncommitted/tmp files from failed > tasks > -------------------------------------------------------------------------------- > > Key: HIVE-131 > URL: https://issues.apache.org/jira/browse/HIVE-131 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor > Reporter: Joydeep Sen Sarma > Priority: Critical > > _tmp files are getting left behind on insert overwrite directory: > /user/jssarma/ctst1/40422_m_000195_0.deflate <r 3> 13285 2008-12-07 01:47 > rw-r--r-- jssarma supergroup > /user/jssarma/ctst1/40422_m_000196_0.deflate <r 3> 3055 2008-12-07 01:46 > rw-r--r-- jssarma supergroup > /user/jssarma/ctst1/_tmp.40422_m_000033_0 <r 3> 0 2008-12-07 01:53 rw-r--r-- > jssarma supergroup > /user/jssarma/ctst1/_tmp.40422_m_000037_1 <r 3> 0 2008-12-07 01:53 rw-r--r-- > jssarma supergroup > this happened with speculative execution. the code looks good (in fact in > this case many speculative tasks were launched - and only a couple caused > problems). Almost seems like these files did not appear in the namespace > until after the map-reduce job finished and the movetask did a listing of the > output dir .. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.