Hi Markus,

I was wondering what you exactly mean with dynamic. Is it different per fetch cycle but for all queues or do you mean a different value for different queues. (For example, when type is HOST, hostA will have a different generate max count than hostB).

Ferdy.

On 11/04/2011 12:32 AM, Markus Jelsma wrote:
Hi,

The generate.max.count defines the number of records per tpye of queue. We're
looking for an improvement to make this setting dynamic. The new variable
would be the number of total records for that type of queue (ip, host,
domain).

How can we adapt the generator for this? The problem is that there's no
information on the number of records for a given URL.

Any thoughts? Could we perhaps modify the updater to count the number of
records for a queue and write it to the CrawlDatum without building a new
updater tool based on the information provided by the current domainstatistics
tool?

Thanks

Reply via email to