[Nutch-general] Re: Admin Gui beta test (was Re: ATB: Heritrix)

2006-05-02 Thread Karsten Dello
Hi Stefan, did you find a solution? I'd really like to give the admin gui a try. Cheers Karsten PS: My offer to host that file is still open :-) Stefan Groschupf schrieb: I think it should be possible to put your binary at the Apache site, probably Doug will be the right person to talk

[Nutch-general] Re: Nutch Admin Gui Mirror

2006-05-03 Thread Karsten Dello
Hello, 2006/5/3, Stefan Groschupf [EMAIL PROTECTED]: As soon you have the file uploaded to your server please publish the link immediately in nutch user mailing list to load balance the traffic as much we can. I have put your file here:

[Nutch-general] limited depth with internet crawl?

2006-05-31 Thread Karsten Dello
Hello, I have used the intranet crawl for the following simple task: Given a list of relevant starturls, get all documents within the reach of two clicks. We use this mechanism for monitoring a couple of dozen lists on the internet. This was easy using the -depth parameter of the crawl tool.

[Nutch-general] parsing and using xml-data

2006-06-08 Thread Karsten Dello
Dear list, I would like to process metadata from publication repositories into a nutch index. The metadata comes as xml (OAI_PMH to be more precise). The starting URLs look like http://oai_host/servlet?method=getRecordsset=someSet Theses requests return lists, which basically look like list

Re: [Nutch-general] Added 0 pages

2006-07-13 Thread Karsten Dello
Hi, in my opinion Julius Schorzman wrote: http://www.apache.com is not matched by the regex +^http://([a-z0-9]*\.)*apache.com/ as it does not end with a trailing slash. Cheers Karsten - Using Tomcat but need to do

Re: [Nutch-general] Starting Nutch in init.d?

2006-08-16 Thread Karsten Dello
Hi Bill, this starts the process as root? That shouldn't be necessary. One recommended way is to run tomcat as a daemon using jsvc, see http://tomcat.apache.org/tomcat-5.5-doc/setup.html Works fine for me. Cheers Karsten Bill Goffe wrote: Last month I mentioned that I was having problems

Re: [Nutch-general] uploading the nutch war file

2006-08-19 Thread Karsten Dello
Have you enabled the tomcat manager application at all? If not, see http://tomcat.apache.org/tomcat-5.0-doc/manager-howto.html There are easier ways to deploy a war file, anyway. Cheers Karsten - Using Tomcat but need to

[Nutch-general] Unsolved: Problem with fetching

2006-12-11 Thread Karsten Dello
? Any help is still very much appreciated! Best regards Karsten -- Forwarded message -- From: Karsten Dello [EMAIL PROTECTED] Date: 06.12.2006 02:44 Subject: Problem with fetching (cont.) To: nutch-user@lucene.apache.org Sorry, the mail I just sent was incomplete

[Nutch-general] parse process hangs

2007-07-06 Thread Karsten Dello
hello, i just migrated from 0.8.1 to 0.9 and ran into a problem with parsing (we do parsing after fetching) of a 50 pages segment. the process is using 0% cpu, but a lot of memory (goes like that for hours). it seems to be stalled according the logfiles. PID USER PR NI VIRT RES

[Nutch-general] Fetching HTTPS behind Proxy fails - Patch exists, but is not included in 0.9

2007-07-29 Thread Karsten Dello
Hi, fetching via https doesn't work with protocol-httpclient if a proxy is used. The attached patch in http://issues.apache.org/jira/browse/NUTCH-126 solved this problem 1 1/2 years ago, but it is not in 0.9 release nor in trunk: