Thanks Chen, I will try that:) On 9/8/05, AJ Chen <[EMAIL PROTECTED]> wrote: > Jack, > Set the max to 100, but run 10 cycles (i.e., depth=10) with the > CrawlTool. You may see all the outlinks are collected toward the end. 3 > cycles is usually not enough. > -AJ > > Jack Tang wrote: > > >Yes, Stefan. > >But it missed some URLs, and I set the value to 3000, then everything is OK > > > >/Jack > > > >On 9/8/05, Stefan Groschupf <[EMAIL PROTECTED]> wrote: > > > > > >>Jack, > >>That is max outlinks per html page. > >>All your example pages have less than 100 outlinks, right?! > >>Stefan > >> > >>Am 07.09.2005 um 18:43 schrieb Jack Tang: > >> > >> > >> > >>>Hi All > >>> > >>>Here is the "db.max.outlinks.per.page" property and its description in > >>>nutch-default.xml > >>> <property> > >>> <name>db.max.outlinks.per.page</name> > >>> <value>100</value> > >>> <description>The maximum number of outlinks that we'll > >>>process for a page. > >>> </description> > >>> </property> > >>> > >>>I don't think the description is right. > >>>Say, my crawler feeds are: > >>>http://www.a.com/index.php (90 outlinks) > >>>http://www.b.com/index.jsp (80 outlinks) > >>>http://www.c.com/index.html (50 outlinks) > >>> > >>>and the number of crawler thread is 30. Do you think the reminder URLs > >>>( (80 -10) outlinks + 50 outlinks) will be fetched? > >>> > >>>I think the description should be "The maximum number of outlinks in > >>>one fecthing phase." > >>> > >>> > >>>Regards > >>>/Jack > >>>-- > >>>Keep Discovering ... ... > >>>http://www.jroller.com/page/jmars > >>> > >>> > >>> > >>> > >>--------------------------------------------------------------- > >>company: http://www.media-style.com > >>forum: http://www.text-mining.org > >>blog: http://www.find23.net > >> > >> > >> > >> > >> > >> > > > > > > > > > > -- > AJ (Anjun) Chen, Ph.D. > Canova Bioconsulting > Marketing * BD * Software Development > 748 Matadero Ave., Palo Alto, CA 94306, USA > Cell 650-283-4091, [EMAIL PROTECTED] > --------------------------------------------------- > >
-- Keep Discovering ... ... http://www.jroller.com/page/jmars
