Re: Fetch queue size, Multiple seed URLs and Maximum Depth

Mattmann, Chris A (3980) Sun, 08 Feb 2015 11:20:11 -0800

Thanks Preetam:


>[..snip..]
>Why would you want to?
>
>
>
>
>Preetam : Just was curious to manually handle this if possible.
>I was anticipating that once the db has been fetched and CrawlDB has all
>the URLs crawled data to depth 2, the next run should not crawl the same
>URLs again. 
>Is it that URLs fetched at depth 2 which are kept unfetched in the queue
>(not deque'd since  it has crawled the threshold depth passed) due to the
>depth value constraint and are hence fetched in the next run resulting in
>increase in fetch size ?

+1. Yep.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Re: Fetch queue size, Multiple seed URLs and Maximum Depth

Reply via email to