[ 
https://issues.apache.org/jira/browse/NUTCH-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702412#action_12702412
 ] 

Julien Nioche commented on NUTCH-692:
-------------------------------------

OK I had the same problem again on my main cluster, one of the nodes lost 
contact with the master during a parsing and the subsequent attempts failed 
with AlreadyBeingCreatedException.

I managed to reproduce the problem locally using a fresh copy from SVN by 
hacking  the BasicURLNormalizer to make it sleep for 5 mins everytime it gets a 
URL, which gave me plenty of time to fail a reduce task with 

./hadoop job -fail-task attempt_200904241525_0007_r_000000_0

as expected the following attempts failed with AlreadyBeingCreatedException.

I did the same experiment using your patch and can confirm that it solves the 
problem. 

Thanks

J.

> AlreadyBeingCreatedException with Hadoop 0.19
> ---------------------------------------------
>
>                 Key: NUTCH-692
>                 URL: https://issues.apache.org/jira/browse/NUTCH-692
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: Julien Nioche
>         Attachments: NUTCH-692.patch
>
>
> I have been using the SVN version of Nutch on an EC2 cluster and got some 
> AlreadyBeingCreatedException during the reduce phase of a parse. For some 
> reason one of my tasks crashed and then I ran into this 
> AlreadyBeingCreatedException when other nodes tried to pick it up.
> There was recently a discussion on the Hadoop user list on similar issues 
> with Hadoop 0.19 (see 
> http://markmail.org/search/after+upgrade+to+0%2E19%2E0). I have not tried 
> using 0.18.2 yet but will do if the problems persist with 0.19
> I was wondering whether anyone else had experienced the same problem. Do you 
> think 0.19 is stable enough to use it for Nutch 1.0?
> I will be running a crawl on a super large cluster in the next couple of 
> weeks and I will confirm this issue  
> J.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to