Re: Per-host fetch-interval

Andrzej Bialecki Wed, 24 Jun 2009 01:40:44 -0700

Sandeep Tata wrote:

Hi,


I was wondering what would be the best way to configure per-host
re-crawl intervals. The default db.fetch.interval applies to all URLs,
but I'd like for some hosts to be recrawled more frequently. Is there
a JIRA ticket open on this? I haven't been able to find one

Fetch interval can be set on individual CrawlDatum-s in crawldb, atleast technically speaking. In practice, there is no command-line toolto do this, and I don;t think there is a JIRA on this.

One idea would be to modify the Injector to accept a list of URL-s withmatching metadata, and among others use a predefined metadata likefetchInterval. On the initial injection, all values in CrawlDatum wouldbe set according to the metadata (or set to defaults). On subsequentinjections, if a URL already exists in CrawlDb, its metadata would bereset to the values supplied in the injector file.

This should be easy to implement, and I think it would support your usecase.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Per-host fetch-interval

Reply via email to