1 - 10 of 507 matches
2007/08/09
All,
Does anyone have an updated recrawl script for 0.9?
Also, does anyone have a link that describes each phase of a crawl /
recrawl (for 0.9)
it looks like it changes each version. I searched the wiki, but i am
still unclear.
thanks
-- Brian Demers
2006/07/14
Does anyone have a good Intranet recrawl script for nutch-0.8.0? Thanks..
Matt
-- Matthew Holt
2006/08/03
Hello,
I was searching for the method to add new url to the crawling url list
and how to recrawl all urls...
Can you help me ?
thanks,
--
Nahuel ANGELINETTI
-- Nahuel ANGELINETTI
2006/12/29
hi,
I'm new to nutch. I have crawled my website. But we can I recrawl/refresh the
index without delete the crawl folder?
kind regards
frank
-- Otto, Frank
2006/07/20
I sent out a few emails regarding a recrawl script I wrote. However, if
it' be easier for anyone to help, can you please check that all of the
below steps are the only ones that need to be taken to recrawl? Or if
there is a resource online that describes manually -- Matthew Holt
2009/05/14
Thanks for these information about recrawling.
I am running a recrawling operation but every time I do it, I don't get the
same results as the first crawl(different documents , not the same web
pages). So how can I handle to recrawl same pages?
Maybe fixe the property -- aidahaj
2007/05/05
Hi,
I crawled a website. Around 500 out of 5000 pages generated
errors/exceptions. I would like to recrawl only these 500 pages. The errors
appear to be something similar to this:
Segment#1: 0 errors
Segment#2: 120 errors
Segment#3: 10 errors
Segment#4: 370 errors
Segment#5: 0 errors
Q1 -- karthik085
2008/06/06://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: scottyd [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, June 5, 2008 2:44:21 PM
Subject: recrawl in 1.0
I was wondering how to accomplish a recrawl in the trunk release of nutch -- ogjunk-nutch
2008/11/07
Hi,
When the recrawl is being done, the app server requires a restart to get the
new indexes reflected.
If the folder where the recrawl must be done is pointed by the web app, a
folder named merge-output is created inside the index folder once the
recrawl is completed.
Is there any way -- shree lakshmi
2009/12/16
hi,
i just want to know the difference between a first initial crawl and a recrawl
using the fetch, generate, update commands
is there a diffence in time between using an initial crawl every time (by
deleting the crawl_folder ) and using a recrawl without deleting the initial
crawl -- BELLINI ADAM