Niclas,
Add -showThreadID -logLevel FINE to the fetch.
The way you are calling the fetch (without the -noParsing switch) tells the
fetch to parse within the fetch step.
You could use the -noParsing switch and add:
./nutch parse $segDir -showThreadID -logLevel FINE
This will break up the fetch
It seems this screen doesn't like to be visited when there are no jobs
running. Not critical, but I thought I would mention it.
051004 091344 /jobdetails.jsp:
java.lang.NullPointerException
at
org.apache.jsp.jobdetails_jsp._jspService(jobdetails_jsp.java:68)
at
On Mon, 2005-10-03 at 15:57 -0700, Doug Cutting wrote:
Try the following on your system:
bin/nutch org.apache.nutch.io.TestSequenceFile -fast -count 2000
-megabytes 100 foo
Tell me how it behaves during the sort phase.
We changed interpreters on some machines to see if that would
Earl Cahill wrote:
1. Sounds like some of you have some glue programs
that help run the whole process. Are these going to
end up in subversion sometime? I am guessing there is
much duplicated effort.
I'm not sure what you mean. I set environment variables in my .bashrc,
then simply use
I've been trying to do some experimentation with nutch 0.7.1 (this is on
Windows 2000).
I set things up to crawl a local drive (well, actually a network mapped
drive) and it seemed to work fine. I let run for a bit but then aborted
it because I wanted to adjust something.
I deleted all the
Rod Taylor wrote:
Tell me how it behaves during the sort phase.
I ran 8 jobs simultaneously. Very high await time (1200) and it was
doing about 22MB/sec data writes. Nearly 0 reads from disk (everything
would be cached in memory).
This is during the sort part? This first writes a big file,
I would like to install nutch on a WinXP laptop. What is the minimal
cygwin installation I need for nutch to work?
Thanks in advance for any help in answering this question.
On Tue, 2005-10-04 at 09:52 -0700, Doug Cutting wrote:
Rod Taylor wrote:
Tell me how it behaves during the sort phase.
I ran 8 jobs simultaneously. Very high await time (1200) and it was
doing about 22MB/sec data writes. Nearly 0 reads from disk (everything
would be cached in memory).
Denis Haskin wrote:
I've been trying to do some experimentation with nutch 0.7.1 (this is
on Windows 2000).
I set things up to crawl a local drive (well, actually a network
mapped drive) and it seemed to work fine. I let run for a bit but
then aborted it because I wanted to adjust
I was deleting the whole crawl-2005... etc directory tree (any of them
that I have). I still get the error.
Thanks,
dwh
Gal Nitzan wrote:
Just delete
D:\workspaces\work\nutch-0.7.1\crawl-20051004120328\db\webdb.old
Gal
I think it would be better to have the junit tests
start jetty then
crawl localhost. I'd love to see some end-to-end
unit tests like that.
+1
I think this would also make it nice to test things
like recursive linking, parsing pdfs or other file
formats, observing robots.txt or any
I think end to end testing must focus on end to end
problems (ie checking
pdf parsing is already checked by unit tests, and it
is really the right place for doing it).
Hate to say it, but today was the first time I got ant
test to work (hadn't tried too hard), and yeah, I saw
several such
This datanode had been running for about 9 hours until it started
running into troubles.
051004 180521 3370 Received block blk_5675690077943834423
from /192.168.100.14
051004 180522 3371 Received block blk_-4194027887562402267
from /192.168.100.14
051004 180549 3372 Received block
13 matches
Mail list logo