[ 
https://issues.apache.org/jira/browse/NUTCH-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694942#action_12694942
 ] 

Julien Nioche commented on NUTCH-692:
-------------------------------------

As I pointed out in my previous message the root of the problem in my case was 
related to some dodgy URLs coming from the Javascript parser which put the 
basic normalizer into a spin. This would repeat in subsequent attempts indeed.

However the AlreadyBeingCreatedException should not happen and we should not 
have output files left open. If you patch fixes that I am sure that this will 
be a very welcome contribution.

> AlreadyBeingCreatedException with Hadoop 0.19
> ---------------------------------------------
>
>                 Key: NUTCH-692
>                 URL: https://issues.apache.org/jira/browse/NUTCH-692
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: Julien Nioche
>
> I have been using the SVN version of Nutch on an EC2 cluster and got some 
> AlreadyBeingCreatedException during the reduce phase of a parse. For some 
> reason one of my tasks crashed and then I ran into this 
> AlreadyBeingCreatedException when other nodes tried to pick it up.
> There was recently a discussion on the Hadoop user list on similar issues 
> with Hadoop 0.19 (see 
> http://markmail.org/search/after+upgrade+to+0%2E19%2E0). I have not tried 
> using 0.18.2 yet but will do if the problems persist with 0.19
> I was wondering whether anyone else had experienced the same problem. Do you 
> think 0.19 is stable enough to use it for Nutch 1.0?
> I will be running a crawl on a super large cluster in the next couple of 
> weeks and I will confirm this issue  
> J.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to