Hi Vijith it only happen when the fetcher.parse is true and fetcher.follow.outlinks.depth is greater than 0. When Two url (A,B) direct to same url (C) and that url will fetch twice, maybe i think you can deduplicate the url (C) in handleRedirect function in fetcher.java.
On Fri, Aug 31, 2012 at 8:39 PM, Lewis John Mcgibbney < [email protected]> wrote: > No hassle Vijith > > Thank you > > Lewis > > On Fri, Aug 31, 2012 at 1:37 PM, Vijith <[email protected]> wrote: > > I apologize..I was sending to mailing list with out subscribing to it. I > > found the reply from Lewis (from archive). I will comment directly on the > > issue. Thanks. > > > > > > On Fri, Aug 31, 2012 at 5:59 PM, Vijith <[email protected]> wrote: > >> > >> Hi all, > >> > >> (Please ignore my previous mail, if any) > >> > >> I am new to dev... I am working on > >> NUTCH-1150...https://issues.apache.org/jira/browse/NUTCH-1150 > >> I would like to get some directions before I can start... Right now I am > >> going through the Fetcher.java code... > >> > >> I have tried running nutch with a sample site with two different urls > >> redirecting to a common resource. > >> I could not find any clues, from hadoop.log, where the common resource > is > >> parsed multiple times. > >> Could some one please explain the exact scenario that creates this bug. > >> > >> And how does this bug relates to NUTCH-1184 ? > >> > >> -- > >> Vijith V. > >> > >> > > > > > > > > -- > > . . . . . thanks & regards > > > > Vijith V. > > > > > > > > -- > Lewis > -- Don't Grow Old, Grow Up... :-)

