Re: NewbieNutcher.....

2005-10-04 Thread Jeff Pettenski
Niclas, Add -showThreadID -logLevel FINE to the fetch. The way you are calling the fetch (without the -noParsing switch) tells the fetch to parse within the fetch step. You could use the -noParsing switch and add: ./nutch parse $segDir -showThreadID -logLevel FINE This will break up the fetch

jobdetails.jsp crash

2005-10-04 Thread Rod Taylor
It seems this screen doesn't like to be visited when there are no jobs running. Not critical, but I thought I would mention it. 051004 091344 /jobdetails.jsp: java.lang.NullPointerException at org.apache.jsp.jobdetails_jsp._jspService(jobdetails_jsp.java:68) at

Re: mapred Sort Progress Reports

2005-10-04 Thread Rod Taylor
On Mon, 2005-10-03 at 15:57 -0700, Doug Cutting wrote: Try the following on your system: bin/nutch org.apache.nutch.io.TestSequenceFile -fast -count 2000 -megabytes 100 foo Tell me how it behaves during the sort phase. We changed interpreters on some machines to see if that would

Re: a simple map reduce tutorial

2005-10-04 Thread Doug Cutting
Earl Cahill wrote: 1. Sounds like some of you have some glue programs that help run the whole process. Are these going to end up in subversion sometime? I am guessing there is much duplicated effort. I'm not sure what you mean. I set environment variables in my .bashrc, then simply use

Always getting Impossible condition now...

2005-10-04 Thread Denis Haskin
I've been trying to do some experimentation with nutch 0.7.1 (this is on Windows 2000). I set things up to crawl a local drive (well, actually a network mapped drive) and it seemed to work fine. I let run for a bit but then aborted it because I wanted to adjust something. I deleted all the

Re: mapred Sort Progress Reports

2005-10-04 Thread Doug Cutting
Rod Taylor wrote: Tell me how it behaves during the sort phase. I ran 8 jobs simultaneously. Very high await time (1200) and it was doing about 22MB/sec data writes. Nearly 0 reads from disk (everything would be cached in memory). This is during the sort part? This first writes a big file,

What is the minimal cygwin install needed for nutch?

2005-10-04 Thread CHAFFEE Todd Consultant
I would like to install nutch on a WinXP laptop. What is the minimal cygwin installation I need for nutch to work? Thanks in advance for any help in answering this question.

Re: mapred Sort Progress Reports

2005-10-04 Thread Rod Taylor
On Tue, 2005-10-04 at 09:52 -0700, Doug Cutting wrote: Rod Taylor wrote: Tell me how it behaves during the sort phase. I ran 8 jobs simultaneously. Very high await time (1200) and it was doing about 22MB/sec data writes. Nearly 0 reads from disk (everything would be cached in memory).

Re: Always getting Impossible condition now...

2005-10-04 Thread Gal Nitzan
Denis Haskin wrote: I've been trying to do some experimentation with nutch 0.7.1 (this is on Windows 2000). I set things up to crawl a local drive (well, actually a network mapped drive) and it seemed to work fine. I let run for a bit but then aborted it because I wanted to adjust

Re: Always getting Impossible condition now...

2005-10-04 Thread Denis Haskin
I was deleting the whole crawl-2005... etc directory tree (any of them that I have). I still get the error. Thanks, dwh Gal Nitzan wrote: Just delete D:\workspaces\work\nutch-0.7.1\crawl-20051004120328\db\webdb.old Gal

Re: a simple map reduce tutorial

2005-10-04 Thread Jérôme Charron
I think it would be better to have the junit tests start jetty then crawl localhost. I'd love to see some end-to-end unit tests like that. +1 I think this would also make it nice to test things like recursive linking, parsing pdfs or other file formats, observing robots.txt or any

Re: a simple map reduce tutorial

2005-10-04 Thread Earl Cahill
I think end to end testing must focus on end to end problems (ie checking pdf parsing is already checked by unit tests, and it is really the right place for doing it). Hate to say it, but today was the first time I got ant test to work (hadn't tried too hard), and yeah, I saw several such

Nutch Datanode Exceptions

2005-10-04 Thread Rod Taylor
This datanode had been running for about 9 hours until it started running into troubles. 051004 180521 3370 Received block blk_5675690077943834423 from /192.168.100.14 051004 180522 3371 Received block blk_-4194027887562402267 from /192.168.100.14 051004 180549 3372 Received block