Hi Chen I don't think it is the limitation of ONE page but ONE fetching phase (cycle). In my previous example,
feed urls: http://www.a.com/index.php (90 outlinks) http://www.b.com/index.jsp (80 outlinks) http://www.c.com/index.html (50 outlinks) 90 + 80 + 50 = 220 outlinks, they are totally different. And I used protocol-httpclient plugin. In one fetching cycle, if the sum of fecthing outlink is 100, then the others will be abandoned. Right? /Jack On 9/8/05, AJ Chen <[EMAIL PROTECTED]> wrote: > My understanding is that only up to the maximum number of outlinks are > processed for a page when updating the web db. I assume the same page > won't get fetched and processed again in the next fetch/update cycles, > then you won't get those outlinks exceeding the maximum number no matter > how many cycles you are running. > > To make sure all of the outlinks are processed for a page, the > db.max.outlinks.per.page must be set to a number that is larger than the > number of outlinks on the page. If these is true, then the max number > has to be determined in real time since the number of outlinks varies > from page to page. > > Is my understanding correct? > > AJ > > > Jack Tang wrote: > > >Hi All > > > >Here is the "db.max.outlinks.per.page" property and its description in > >nutch-default.xml > > <property> > > <name>db.max.outlinks.per.page</name> > > <value>100</value> > > <description>The maximum number of outlinks that we'll process for > > a page. > > </description> > > </property> > > > >I don't think the description is right. > >Say, my crawler feeds are: > >http://www.a.com/index.php (90 outlinks) > >http://www.b.com/index.jsp (80 outlinks) > >http://www.c.com/index.html (50 outlinks) > > > >and the number of crawler thread is 30. Do you think the reminder URLs > >( (80 -10) outlinks + 50 outlinks) will be fetched? > > > >I think the description should be "The maximum number of outlinks in > >one fecthing phase." > > > > > >Regards > >/Jack > > > > > > -- > AJ (Anjun) Chen, Ph.D. > Canova Bioconsulting > Marketing * BD * Software Development > 748 Matadero Ave., Palo Alto, CA 94306, USA > Cell 650-283-4091, [EMAIL PROTECTED] > --------------------------------------------------- > > -- Keep Discovering ... ... http://www.jroller.com/page/jmars
