Re: about time for recrawl a url

2014-04-30 Thread A Laxmi
the time for recrawl a url. any idea about the place where i can learn about that? Im using nutch 1.5.1. I know that initially the next fetch time is based on db.fetch.interval.default property and this time is changing for db.fetch.schedule.adaptive.inc_rate

about time for recrawl a url

2013-09-06 Thread Eyeris Rodriguez Rueda
Hi all. I want to know about the time for recrawl a url. any idea about the place where i can learn about that? Im using nutch 1.5.1. I know that initially the next fetch time is based on db.fetch.interval.default property and this time is changing for db.fetch.schedule.adaptive.inc_rate

Re: about time for recrawl a url

2013-09-06 Thread feng lu
at 10:26 PM, Eyeris Rodriguez Rueda eru...@uci.cuwrote: Hi all. I want to know about the time for recrawl a url. any idea about the place where i can learn about that? Im using nutch 1.5.1. I know that initially the next fetch time is based on db.fetch.interval.default property

RE: recrawl a URL?

2012-08-30 Thread Max Dzyuba
[mailto:lewis.mcgibb...@gmail.com] Sent: den 27 augusti 2012 15:03 To: user@nutch.apache.org Subject: Re: recrawl a URL? The crawldb needs to receive updates of data in fetched segments, once you generate it will calculate what needs to be fetched in next iteration. It is OK to store segments in different

Re: recrawl a URL?

2012-08-30 Thread Lewis John Mcgibbney
Hi Max, On Tue, Aug 28, 2012 at 3:24 PM, Max Dzyuba max.dzy...@comintelli.com wrote: Is it possible to use the same crawldb but store segment data in a different directory for consecutive crawls using the bin/nutch crawl command? I thought that there is no option to specify the path to crawldb

Re: recrawl a URL?

2012-08-30 Thread Rémy Amouroux
In order to store the crawldb and the segments in differents directory, you will have to use the inject,generate,fetch,parse and updatedb command. Those commands allows to define both crawldb and segments paths. The only way I see in nutch 1.4 to do this using the crawl command is to move

RE: recrawl a URL?

2012-08-30 Thread Max Dzyuba
To: user@nutch.apache.org Subject: Re: recrawl a URL? Hi Max, On Tue, Aug 28, 2012 at 3:24 PM, Max Dzyuba max.dzy...@comintelli.com wrote: Is it possible to use the same crawldb but store segment data in a different directory for consecutive crawls using the bin/nutch crawl command? I thought

RE: recrawl a URL?

2012-08-28 Thread Max Dzyuba
Nutch 1.5. If it's possible, how would the crawl command look like? Thanks for your help! -Original Message- From: Lewis John Mcgibbney [mailto:lewis.mcgibb...@gmail.com] Sent: den 27 augusti 2012 15:03 To: user@nutch.apache.org Subject: Re: recrawl a URL? The crawldb needs to receive

RE: recrawl a URL?

2012-08-27 Thread Max Dzyuba
[mailto:markus.jel...@openindex.io] Sent: den 24 augusti 2012 21:26 To: user@nutch.apache.org; alx...@aim.com Subject: RE: recrawl a URL? No, the CrawlDatum's status field will be set to db_notmodified if the signatures match regardless of the HTTP headers. The header only sets a fetch_notmodified

Re: recrawl a URL?

2012-08-27 Thread Lewis John Mcgibbney
the page has been changed since the last crawl? Thanks, Max -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: den 24 augusti 2012 21:26 To: user@nutch.apache.org; alx...@aim.com Subject: RE: recrawl a URL? No, the CrawlDatum's status field will be set

recrawl a URL?

2012-08-24 Thread Max Dzyuba
Hello everyone, I run a crawl command every day, but I don't want Nutch to submit an update to Solr if a particular page hasn't changed. How do I achieve that? Right now the value of db.fetch.interval.default doesn't seem to help prevent the crawl since the updates are submitted to Solr as if

Auto-Re: recrawl a URL?

2012-08-24 Thread fu xiang hua
您的邮件已收到,谢谢!

RE: recrawl a URL?

2012-08-24 Thread Max Dzyuba
Thank you for the reply. Does it mean that it is not supported in latest stable release of Nutch? -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: den 24 augusti 2012 17:21 To: user@nutch.apache.org; Max Dzyuba Subject: RE: recrawl a URL? Hi, Trunk has

RE: recrawl a URL?

2012-08-24 Thread Max Dzyuba
/IndexerMapReduce.java?view=markup -Original message- From:Max Dzyuba max.dzy...@comintelli.com Sent: Fri 24-Aug-2012 17:35 To: Markus Jelsma markus.jel...@openindex.io; user@nutch.apache.org Subject: RE: recrawl a URL? Thank you for the reply. Does it mean that it is not supported in latest stable

Re: recrawl a URL?

2012-08-24 Thread alxsss
9:02 am Subject: RE: recrawl a URL? Thanks again! I'll have to test it more then in my 1.5.1. Best regards, MaxMarkus Jelsma markus.jel...@openindex.io wrote:Hmm, i had to look it up but it is supported in 1.5 and 1.5.1: http://svn.apache.org/viewvc/nutch/tags/release-1.5.1/src/java/org

RE: recrawl a URL?

2012-08-24 Thread Markus Jelsma
20:14 To: user@nutch.apache.org; max.dzy...@comintelli.com Subject: Re: recrawl a URL? This will work only for urls that has If-Modified-Since headers. But most urls does not have this header. Thanks. Alex. -Original Message- From: Max Dzyuba max.dzy