Hi Vijith

May be Markus Jelsma  already sloved this issue by keeping a list of
crawled URL's in a external bloom filter. So you can ask Markus Jelsma to
confirm it.

On Sat, Sep 1, 2012 at 2:05 PM, Vijith <[email protected]> wrote:

> Thanks a lot Feng. I will try the same...
>
>
> On Sat, Sep 1, 2012 at 7:36 AM, feng lu <[email protected]> wrote:
>
>> Hi  Vijith
>>
>> it only happen when the fetcher.parse is true and
>> fetcher.follow.outlinks.depth is greater than 0. When Two url (A,B)
>> direct to same url (C) and that url will fetch twice, maybe i think you can 
>> deduplicate
>> the url (C) in handleRedirect function in fetcher.java.
>>
>> On Fri, Aug 31, 2012 at 8:39 PM, Lewis John Mcgibbney <
>> [email protected]> wrote:
>>
>>> No hassle Vijith
>>>
>>> Thank you
>>>
>>> Lewis
>>>
>>> On Fri, Aug 31, 2012 at 1:37 PM, Vijith <[email protected]> wrote:
>>> > I apologize..I was sending to mailing list with out subscribing to it.
>>> I
>>> > found the reply from Lewis (from archive). I will comment directly on
>>> the
>>> > issue. Thanks.
>>> >
>>> >
>>> > On Fri, Aug 31, 2012 at 5:59 PM, Vijith <[email protected]> wrote:
>>> >>
>>> >> Hi all,
>>> >>
>>> >> (Please ignore my previous mail, if any)
>>> >>
>>> >> I am new to dev... I am working on
>>> >> NUTCH-1150...https://issues.apache.org/jira/browse/NUTCH-1150
>>> >> I would like to get some directions before I can start... Right now I
>>> am
>>> >> going through the Fetcher.java code...
>>> >>
>>> >> I have tried running nutch with a sample site with two different urls
>>> >> redirecting to a common resource.
>>> >> I could not find any clues, from hadoop.log, where the common
>>> resource is
>>> >> parsed multiple times.
>>> >> Could some one please explain the exact scenario that creates this
>>> bug.
>>> >>
>>> >> And how does this bug relates to NUTCH-1184 ?
>>> >>
>>> >> --
>>> >> Vijith V.
>>> >>
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > . . . . . thanks & regards
>>> >
>>> > Vijith V.
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Lewis
>>>
>>
>>
>>
>> --
>> Don't Grow Old, Grow Up... :-)
>>
>
>
>
> --
> *. . . . . thanks & regards*
> *
> *
> *Vijith V.*
>
>
>


-- 
Don't Grow Old, Grow Up... :-)

Reply via email to