0.7.2 of nutch Le Thu, 3 Aug 2006 09:37:24 -0300, "Lourival Júnior" <[EMAIL PROTECTED]> a écrit :
> Which version are you using? > > On 8/3/06, Nahuel ANGELINETTI <[EMAIL PROTECTED]> wrote: > > > > But the websites just added hasn't been yet crawled... And they're > > not crawled during recrawl... > > Does "bin/nutch purge" will restart all ? > > > > > > > > Le Thu, 3 Aug 2006 09:21:04 -0300, > > "Lourival Júnior" <[EMAIL PROTECTED]> a écrit : > > > > > In the nutch conf/nutch-default.xml configuration file exist a > > > property call db.default.fetch.interval. When you crawl a site, > > > nutch schedules the next fetch to "today + > > > db.default.fetch.interval" days. If execute the recrawl command > > > and the pages that you fetch don't reach this date, they won't be > > > re-fetched. When you add new urls to the webdb, they will be > > > ready to be fetch. So at this moment only this pages will be > > > fetched by the recrawl script. > > > > > > I hope I helped you. If I said some wrong thing, please correct > > > me :) > > > > > > Regards > > > > > > On 8/3/06, Nahuel ANGELINETTI <[EMAIL PROTECTED]> wrote: > > > > > > > > I have another question, I done what you give me... But it > > > > inject the new urls and "recrawl" it, but against the first > > > > crawl It doesn't download the web pages and really crawl > > > > them... perhaps I'm mistaking somewhere... > > > > Any idea ? > > > > > > > > Regards, > > > > > > > > -- > > > > Nahuel ANGELINETTI > > > > > > > > Le Thu, 3 Aug 2006 08:31:22 -0300, > > > > "Lourival Júnior" <[EMAIL PROTECTED]> a écrit : > > > > > > > > > Hi Nahuel! > > > > > > > > > > You could use the command bin/nutch inject $nutch-dir/db > > > > > -urlfile urlfile.txt. To recrawl your WebDB you can use this > > > > > script.< > > > > > > http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.html> > > > > > > > > > > Take a look to the adddays argument and to the configuration > > > > > property db.default.fetch.interval.They influence to the > > > > > result. > > > > > > > > > > Regards! > > > > > > > > > > On 8/3/06, Nahuel ANGELINETTI <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > Hello, > > > > > > > > > > > > I was searching for the method to add new url to the > > > > > > crawling url list and how to recrawl all urls... > > > > > > > > > > > > Can you help me ? > > > > > > > > > > > > thanks, > > > > > > > > > > > > -- > > > > > > Nahuel ANGELINETTI > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >