Hi Lewis, 'db.max.outlinks.per.page" parameter is never use in nutch 2.x source code.
It controlled by ParseUtil class at this row: for (int i = 0; count < maxOutlinks && i < outlinks.length; i++) But "count" variable is never changed. Canan On Fri, Jun 28, 2013 at 2:32 PM, Jamshaid Ashraf <[email protected]>wrote: > Hi, > > I have followed the given link and updated 'db.max.outlinks.per.page' to -1 > in 'nutch-default' file. > > but facing same issue while crawling ' > http://www.halliburton.com/en-US/default.page & cnn.com', below is the > last > line of fetcher job which shows 0 page found on 3rd or 4th iteration. > > 0/0 spinwaiting/active, 0 pages, 0 errors, 0.0 0 pages/s, 0 0 kb/s, 0 URLs > in 0 queues > -activeThreads=0 > FetcherJob: done > > Please note that when I crawl amazon & others sites it works fine. Do you > think is it because of some restriction of halliborton (robot.txt) or some > misconfiguration at my end? > > Regards, > Jamshaid > > > On Fri, Jun 28, 2013 at 12:37 AM, Lewis John Mcgibbney < > [email protected]> wrote: > > > Hi, > > Can you please try this > > http://s.apache.org/wIC > > Thanks > > Lewis > > > > > > On Thu, Jun 27, 2013 at 8:01 AM, Jamshaid Ashraf <[email protected] > > >wrote: > > > > > Hi, > > > > > > I'm using nutch 2.x with HBase and tried to crawl " > > > http://www.halliburton.com/en-US/default.page" site for depth level 5. > > > > > > Following is the command: > > > > > > bin/crawl urls/seed.txt HB http://localhost:8080/solr/ 5 > > > > > > > > > It worked well till 3rd iteration but for remaining 4th and 5th nothing > > > fetched (same case happened with cnn.com). but if i tried to crawl > other > > > sites like amazon with depth level 5 it works. > > > > > > Could you please guide what could be the reasons for failing of 4th and > > 5th > > > iteration. > > > > > > > > > Regards, > > > Jamshaid > > > > > > > > > > > -- > > *Lewis* > > >

