Hello everyone,
I run a crawl command every day, but I don't want Nutch to submit an update
to Solr if a particular page hasn't changed. How do I achieve that? Right
now the value of db.fetch.interval.default doesn't seem to help prevent the
crawl since the updates are submitted to Solr as if
您的邮件已收到,谢谢!
Thank you for the reply. Does it mean that it is not supported in latest stable
release of Nutch?
-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: den 24 augusti 2012 17:21
To: user@nutch.apache.org; Max Dzyuba
Subject: RE: recrawl a URL?
Hi,
Trunk has
Thanks again! I'll have to test it more then in my 1.5.1.
Best regards,
MaxMarkus Jelsma markus.jel...@openindex.io wrote:Hmm, i had to look it up
but it is supported in 1.5 and 1.5.1:
In Plugin.xml
requires
import plugin=nutch-extensionpoints/
import plugin=another_plugin_necessary_id/
/requires
And on Build.xml of you plugin
target name=deps-jar
ant target=jar inheritall=false dir=../path_of_plugin/
ant target=compile-test inheritall=false
This will work only for urls that has If-Modified-Since headers. But most urls
does not have this header.
Thanks.
Alex.
-Original Message-
From: Max Dzyuba max.dzy...@comintelli.com
To: Markus Jelsma markus.jel...@openindex.io; user user@nutch.apache.org
Sent: Fri, Aug 24, 2012
您的邮件已收到,谢谢!
On Friday, August 24, 2012, fu xiang hua wrote:
您的邮件已收到,谢谢!
--
Sent from Gmail Mobile
Hi,
The CrawlDatum's score field is added to the document via the `boost` field,
this is not a document boost. You'll have to boost on the field manually to see
the LinkRank value in effect. You can do this with a function query or a boost
query.
Cheers,
Markus
-Original message-
No, the CrawlDatum's status field will be set to db_notmodified if the
signatures match regardless of the HTTP headers. The header only sets a
fetch_notmodified but it is not relevant for the db_* status.
-Original message-
From:alx...@aim.com alx...@aim.com
Sent: Fri 24-Aug-2012
10 matches
Mail list logo