[ https://issues.apache.org/jira/browse/CRUNCH-575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14963217#comment-14963217 ]
Gabriel Reid commented on CRUNCH-575: ------------------------------------- This issue (or one very similar to it) is discussed in CRUNCH-515. [~srowen] could you take a quick look at that one first, and see if the underlying problem that you're encountering is or isn't the same as the one mentioned on that ticket (crashing pipelines or pipelines that weren't calling pipeline.done())? It would be good to have an additional sample point to help determine if making this change will just be hiding a different issue (which will lead to a huge number of temp directories), or if we are just running into the limits of 32-bits. On the other hand, if we want to really avoid collisions (and if this isn't due to pipelines which aren't correctly being cleaned up), maybe a UUID is (even) better than a long as a randomizer in the temp dir name. > DistributedPipeline temp dir choice can collide with itself > ----------------------------------------------------------- > > Key: CRUNCH-575 > URL: https://issues.apache.org/jira/browse/CRUNCH-575 > Project: Crunch > Issue Type: Bug > Components: Core > Affects Versions: 0.12.0 > Reporter: Sean Owen > Assignee: Josh Wills > Priority: Minor > Attachments: CRUNCH_575.patch > > > We've observed that Crunch jobs can fail because the output temp dir already > exists: > {code} > 2015-04-02 04:45:49,208 INFO > org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob: > org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory > /tmp/crunch-686245394/p2/output already exists > at > org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:132) > {code} > One possible cause is the choice of random directory name, which is based on > a random nonnegative 32-bit int. The chance of collision is more than 50% at > about 55,000 temp dirs, which is not unimaginable. > A suggested fix, at least for that theoretical cause, is to generate a much > larger random value. 64 bits should put this firmly in the realm of extremely > improbably (billions, not tens of thousands). > (HT [~wilfreds] / CC [~tomwhite]) -- This message was sent by Atlassian JIRA (v6.3.4#6332)