the time for recrawl a url. any idea about the place
where i can learn about that?
Im using nutch 1.5.1.
I know that initially the next fetch time is based on
db.fetch.interval.default property and this time is changing for
db.fetch.schedule.adaptive.inc_rate
Hi all.
I want to know about the time for recrawl a url. any idea about the place where
i can learn about that?
Im using nutch 1.5.1.
I know that initially the next fetch time is based on db.fetch.interval.default
property and this time is changing for db.fetch.schedule.adaptive.inc_rate
at 10:26 PM, Eyeris Rodriguez Rueda eru...@uci.cuwrote:
Hi all.
I want to know about the time for recrawl a url. any idea about the place
where i can learn about that?
Im using nutch 1.5.1.
I know that initially the next fetch time is based on
db.fetch.interval.default property
[mailto:lewis.mcgibb...@gmail.com]
Sent: den 27 augusti 2012 15:03
To: user@nutch.apache.org
Subject: Re: recrawl a URL?
The crawldb needs to receive updates of data in fetched segments, once you
generate it will calculate what needs to be fetched in next iteration. It is
OK to store segments in different
Hi Max,
On Tue, Aug 28, 2012 at 3:24 PM, Max Dzyuba max.dzy...@comintelli.com wrote:
Is it possible to use the same crawldb but store segment data in a different
directory for consecutive crawls using the bin/nutch crawl command? I
thought that there is no option to specify the path to crawldb
In order to store the crawldb and the segments in differents directory, you
will have to use the inject,generate,fetch,parse and updatedb command. Those
commands allows to define both crawldb and segments paths.
The only way I see in nutch 1.4 to do this using the crawl command is to move
To: user@nutch.apache.org
Subject: Re: recrawl a URL?
Hi Max,
On Tue, Aug 28, 2012 at 3:24 PM, Max Dzyuba max.dzy...@comintelli.com
wrote:
Is it possible to use the same crawldb but store segment data in a
different directory for consecutive crawls using the bin/nutch crawl
command? I thought
Nutch 1.5. If it's possible, how would the crawl command look like?
Thanks for your help!
-Original Message-
From: Lewis John Mcgibbney [mailto:lewis.mcgibb...@gmail.com]
Sent: den 27 augusti 2012 15:03
To: user@nutch.apache.org
Subject: Re: recrawl a URL?
The crawldb needs to receive
[mailto:markus.jel...@openindex.io]
Sent: den 24 augusti 2012 21:26
To: user@nutch.apache.org; alx...@aim.com
Subject: RE: recrawl a URL?
No, the CrawlDatum's status field will be set to db_notmodified if the
signatures match regardless of the HTTP headers. The header only sets a
fetch_notmodified
the page has been changed since the last crawl?
Thanks,
Max
-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: den 24 augusti 2012 21:26
To: user@nutch.apache.org; alx...@aim.com
Subject: RE: recrawl a URL?
No, the CrawlDatum's status field will be set
Hello everyone,
I run a crawl command every day, but I don't want Nutch to submit an update
to Solr if a particular page hasn't changed. How do I achieve that? Right
now the value of db.fetch.interval.default doesn't seem to help prevent the
crawl since the updates are submitted to Solr as if
您的邮件已收到,谢谢!
Thank you for the reply. Does it mean that it is not supported in latest stable
release of Nutch?
-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io]
Sent: den 24 augusti 2012 17:21
To: user@nutch.apache.org; Max Dzyuba
Subject: RE: recrawl a URL?
Hi,
Trunk has
/IndexerMapReduce.java?view=markup
-Original message-
From:Max Dzyuba max.dzy...@comintelli.com
Sent: Fri 24-Aug-2012 17:35
To: Markus Jelsma markus.jel...@openindex.io; user@nutch.apache.org
Subject: RE: recrawl a URL?
Thank you for the reply. Does it mean that it is not supported in latest
stable
9:02 am
Subject: RE: recrawl a URL?
Thanks again! I'll have to test it more then in my 1.5.1.
Best regards,
MaxMarkus Jelsma markus.jel...@openindex.io wrote:Hmm, i had to look it up
but
it is supported in 1.5 and 1.5.1:
http://svn.apache.org/viewvc/nutch/tags/release-1.5.1/src/java/org
20:14
To: user@nutch.apache.org; max.dzy...@comintelli.com
Subject: Re: recrawl a URL?
This will work only for urls that has If-Modified-Since headers. But most
urls does not have this header.
Thanks.
Alex.
-Original Message-
From: Max Dzyuba max.dzy
16 matches
Mail list logo