Hi Everyone,

I want to know if it possible to generate multiple fetchlists from the
generator by 'Host' or any other user specified criteria (like a regex) ?
If a single large fetchlist is generated, it causes the fetcher to run for
too long. It would be nice if the URLs could be in separate fetchlists
specified by some criteria making it easier to analyze large crawls and not
having to wait for the entire fetch job to finish.

I was reading the documentation at
http://wiki.apache.org/nutch/bin/nutch%20generate
The property numFetchers and maxNumSegments do talk about generating
multiple fetch partitions and segments.
And generate.max.count, generate.count.mode allow some configurations.

But I did not understand if it is possible to generate multiple fetchlists
(I am currently working in a local mode)

Thank you.

Regards,
Sujen Shah
M.S - Computer Science (Class of 2016)
University of Southern California
http://www.linkedin.com/in/sujenshah

Reply via email to