-----Original message-----
> From:Vijith <[email protected]>
> Sent: Fri 31-Aug-2012 15:44
> To: [email protected]
> Subject: Re: Need some directions
> 
> I have tried running nutch with a sample site with two different urls 
> redirecting to a common resource.
> I could not find any clues, from hadoop.log, where the common resource is 
> parsed multiple times.
> Could some one please explain the exact scenario that creates this bug.

In the Jira comment you said it fetched page4 twice now.

> 
> And how does this bug relates to NUTCH-1184 ? 

It relates to 1184 because if URL's in the same fetch list link to a common 
page, it can be followed.as well.

We solved this issue by keeping a list of crawled URL's in a external bloom 
filter.

> 
> On Thu, Aug 30, 2012 at 11:44 AM, Vijith <[email protected] 
> <mailto:[email protected]> > wrote:
> Hi all, 
> 
> I am new to dev... I am working on NUTCH-1150...
> I would like to get some directions before I can start... Right now I am 
> going through the Fetcher.java code...
> 
> -- 
> . . . . . thanks & regards
> 
> Vijith V.
> 
> 
> 
> 
> 
> -- 
> . . . . . thanks & regards
> 
> Vijith V.
> 
> 
> 

Reply via email to