Re: Fetch queue size, Multiple seed URLs and Maximum Depth

Preetam Pradeepkumar Shingavi Sun, 08 Feb 2015 11:24:13 -0800

Cool.

Thanks & Regards,
Preetam


On Sun, Feb 8, 2015 at 11:16 AM, Mattmann, Chris A (3980) <
[email protected]> wrote:

> Thanks Preetam:
>
>
> >[..snip..]
> >Why would you want to?
> >
> >
> >
> >
> >Preetam : Just was curious to manually handle this if possible.
> >I was anticipating that once the db has been fetched and CrawlDB has all
> >the URLs crawled data to depth 2, the next run should not crawl the same
> >URLs again.
> >Is it that URLs fetched at depth 2 which are kept unfetched in the queue
> >(not deque'd since  it has crawled the threshold depth passed) due to the
> >depth value constraint and are hence fetched in the next run resulting in
> >increase in fetch size ?
>
> +1. Yep.
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: [email protected]
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>

Re: Fetch queue size, Multiple seed URLs and Maximum Depth

Reply via email to