[
https://issues.apache.org/jira/browse/NUTCH-829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643890#comment-13643890
]
Hudson commented on NUTCH-829:
------------------------------
Integrated in Nutch-trunk #2183 (See
[https://builds.apache.org/job/Nutch-trunk/2183/])
NUTCH-829 duplicate hadoop temp files (Revision 1476702)
Result = SUCCESS
tejasp : http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1476702
Files :
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java
> duplicate hadoop temp files
> ---------------------------
>
> Key: NUTCH-829
> URL: https://issues.apache.org/jira/browse/NUTCH-829
> Project: Nutch
> Issue Type: Bug
> Components: generator
> Affects Versions: 1.0.0, 1.1
> Reporter: Mike Baranczak
> Priority: Minor
> Fix For: 1.7, 2.2
>
> Attachments: NUTCH-829.patch, NUTCH-829.v2.patch
>
>
> When two crawls are started at exactly the same time, I see the following
> error:
> {quote}
> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
> file:/tmp/hadoop-mike/mapred/temp/generate-temp-1276463469075 already exists
> at
> org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:111)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:793)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
> at org.apache.nutch.crawl.Generator.generate(Generator.java:472)
> at org.apache.nutch.crawl.Generator.generate(Generator.java:409)
> [...]
> {quote}
> I traced it down to this code in Generator (I'm using Nutch 1.0, but this is
> still in the trunk):
> {quote}
> Path tempDir =
> new Path(getConf().get("mapred.temp.dir", ".") +
> "/generate-temp-"+ System.currentTimeMillis());
> {quote}
> I admit that this is an unlikely scenario for most users, but it just so
> happens that I ran into it. To absolutely guarantee that the temp directory
> doesn't already exist, I suggest changing System.currentTimeMillis() to
> java.util.UUID.randomUUID().toString().
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira