Hi Vijith May be Markus Jelsma already sloved this issue by keeping a list of crawled URL's in a external bloom filter. So you can ask Markus Jelsma to confirm it.
On Sat, Sep 1, 2012 at 2:05 PM, Vijith <[email protected]> wrote: > Thanks a lot Feng. I will try the same... > > > On Sat, Sep 1, 2012 at 7:36 AM, feng lu <[email protected]> wrote: > >> Hi Vijith >> >> it only happen when the fetcher.parse is true and >> fetcher.follow.outlinks.depth is greater than 0. When Two url (A,B) >> direct to same url (C) and that url will fetch twice, maybe i think you can >> deduplicate >> the url (C) in handleRedirect function in fetcher.java. >> >> On Fri, Aug 31, 2012 at 8:39 PM, Lewis John Mcgibbney < >> [email protected]> wrote: >> >>> No hassle Vijith >>> >>> Thank you >>> >>> Lewis >>> >>> On Fri, Aug 31, 2012 at 1:37 PM, Vijith <[email protected]> wrote: >>> > I apologize..I was sending to mailing list with out subscribing to it. >>> I >>> > found the reply from Lewis (from archive). I will comment directly on >>> the >>> > issue. Thanks. >>> > >>> > >>> > On Fri, Aug 31, 2012 at 5:59 PM, Vijith <[email protected]> wrote: >>> >> >>> >> Hi all, >>> >> >>> >> (Please ignore my previous mail, if any) >>> >> >>> >> I am new to dev... I am working on >>> >> NUTCH-1150...https://issues.apache.org/jira/browse/NUTCH-1150 >>> >> I would like to get some directions before I can start... Right now I >>> am >>> >> going through the Fetcher.java code... >>> >> >>> >> I have tried running nutch with a sample site with two different urls >>> >> redirecting to a common resource. >>> >> I could not find any clues, from hadoop.log, where the common >>> resource is >>> >> parsed multiple times. >>> >> Could some one please explain the exact scenario that creates this >>> bug. >>> >> >>> >> And how does this bug relates to NUTCH-1184 ? >>> >> >>> >> -- >>> >> Vijith V. >>> >> >>> >> >>> > >>> > >>> > >>> > -- >>> > . . . . . thanks & regards >>> > >>> > Vijith V. >>> > >>> > >>> >>> >>> >>> -- >>> Lewis >>> >> >> >> >> -- >> Don't Grow Old, Grow Up... :-) >> > > > > -- > *. . . . . thanks & regards* > * > * > *Vijith V.* > > > -- Don't Grow Old, Grow Up... :-)

