Hi Canan
yes, "count" variable is never changed, may be it is a bug. but you problem
may not caused by this issue, in your 3rd iteration it may cause by fetch
or parse failure so it will not generate newer outlinks.


On Fri, Jun 28, 2013 at 8:52 PM, Canan GİRGİN <[email protected]>wrote:

> Hi Lewis,
>
> 'db.max.outlinks.per.page" parameter is never use in nutch 2.x source code.
>
>
> It controlled by ParseUtil class at this row:
> for (int i = 0; count < maxOutlinks && i < outlinks.length; i++)
>
> But "count" variable is never changed.
>
> Canan
>
>
>
>
>
> On Fri, Jun 28, 2013 at 2:32 PM, Jamshaid Ashraf <[email protected]>wrote:
>
>> Hi,
>>
>> I have followed the given link and updated 'db.max.outlinks.per.page' to
>> -1
>> in 'nutch-default' file.
>>
>> but facing same issue while crawling '
>> http://www.halliburton.com/en-US/default.page & cnn.com', below is the
>> last
>> line of fetcher job which shows 0 page found on 3rd or 4th iteration.
>>
>> 0/0 spinwaiting/active, 0 pages, 0 errors, 0.0 0 pages/s, 0 0 kb/s, 0 URLs
>> in 0 queues
>> -activeThreads=0
>> FetcherJob: done
>>
>> Please note that when I crawl amazon & others sites it works fine. Do you
>> think is it because of some restriction of halliborton (robot.txt) or some
>> misconfiguration at my end?
>>
>> Regards,
>> Jamshaid
>>
>>
>> On Fri, Jun 28, 2013 at 12:37 AM, Lewis John Mcgibbney <
>> [email protected]> wrote:
>>
>> > Hi,
>> > Can you please try this
>> > http://s.apache.org/wIC
>> > Thanks
>> > Lewis
>> >
>> >
>> > On Thu, Jun 27, 2013 at 8:01 AM, Jamshaid Ashraf <[email protected]
>> > >wrote:
>> >
>> > > Hi,
>> > >
>> > > I'm using nutch 2.x with HBase and tried to crawl "
>> > > http://www.halliburton.com/en-US/default.page"; site for depth level
>> 5.
>> > >
>> > > Following is the command:
>> > >
>> > > bin/crawl urls/seed.txt HB http://localhost:8080/solr/ 5
>> > >
>> > >
>> > > It worked well till 3rd iteration but for remaining 4th and 5th
>> nothing
>> > > fetched (same case happened with cnn.com). but if i tried to crawl
>> other
>> > > sites like amazon with depth level 5 it works.
>> > >
>> > > Could you please guide what could be the reasons for failing of 4th
>> and
>> > 5th
>> > > iteration.
>> > >
>> > >
>> > > Regards,
>> > > Jamshaid
>> > >
>> >
>> >
>> >
>> > --
>> > *Lewis*
>> >
>>
>
>


-- 
Don't Grow Old, Grow Up... :-)

Reply via email to