recrawl a URL?

2012-08-24 Thread Max Dzyuba
Hello everyone, I run a crawl command every day, but I don't want Nutch to submit an update to Solr if a particular page hasn't changed. How do I achieve that? Right now the value of db.fetch.interval.default doesn't seem to help prevent the crawl since the updates are submitted to Solr as if

Auto-Re: recrawl a URL?

2012-08-24 Thread fu xiang hua
您的邮件已收到,谢谢!

RE: recrawl a URL?

2012-08-24 Thread Max Dzyuba
Thank you for the reply. Does it mean that it is not supported in latest stable release of Nutch? -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: den 24 augusti 2012 17:21 To: user@nutch.apache.org; Max Dzyuba Subject: RE: recrawl a URL? Hi, Trunk has

RE: recrawl a URL?

2012-08-24 Thread Max Dzyuba
Thanks again! I'll have to test it more then in my 1.5.1. Best regards, MaxMarkus Jelsma markus.jel...@openindex.io wrote:Hmm, i had to look it up but it is supported in 1.5 and 1.5.1:

Re: Dependencies between Plugin

2012-08-24 Thread hugo.ma
In Plugin.xml requires import plugin=nutch-extensionpoints/ import plugin=another_plugin_necessary_id/ /requires And on Build.xml of you plugin target name=deps-jar ant target=jar inheritall=false dir=../path_of_plugin/ ant target=compile-test inheritall=false

Re: recrawl a URL?

2012-08-24 Thread alxsss
This will work only for urls that has If-Modified-Since headers. But most urls does not have this header. Thanks. Alex. -Original Message- From: Max Dzyuba max.dzy...@comintelli.com To: Markus Jelsma markus.jel...@openindex.io; user user@nutch.apache.org Sent: Fri, Aug 24, 2012

Auto-Re: LINK RANK CRAWL DATUM SCORE

2012-08-24 Thread fu xiang hua
您的邮件已收到,谢谢!

Re: Auto-Re: LINK RANK CRAWL DATUM SCORE

2012-08-24 Thread J. Delgado
On Friday, August 24, 2012, fu xiang hua wrote: 您的邮件已收到,谢谢! -- Sent from Gmail Mobile

RE: LINK RANK CRAWL DATUM SCORE

2012-08-24 Thread Markus Jelsma
Hi, The CrawlDatum's score field is added to the document via the `boost` field, this is not a document boost. You'll have to boost on the field manually to see the LinkRank value in effect. You can do this with a function query or a boost query. Cheers, Markus -Original message-

RE: recrawl a URL?

2012-08-24 Thread Markus Jelsma
No, the CrawlDatum's status field will be set to db_notmodified if the signatures match regardless of the HTTP headers. The header only sets a fetch_notmodified but it is not relevant for the db_* status. -Original message- From:alx...@aim.com alx...@aim.com Sent: Fri 24-Aug-2012