Thanks for your reply. But I guess this solution
doesn't work for me. Actually, I didn't use this
parameter (I removed it from nutch script).
BTW: My RAM is 4G. I use redhat. kernel is
2.4.20-31.9bigmem.
Have you ever got the out of memory exception when
you used nutch to crawl millions of
If you use the default nutch script i would set a
NUTCH_HEAPSIZE of 2000. That generally works for me
and i have over 100 million urls in db and generally
10 million urls per segment/index.
-byron
--- smith learner [EMAIL PROTECTED] wrote:
Thanks for your reply. But I guess this solution
If that doesn't help, try forcing the garbage collector to work more
often. Everything slows down (unless you are running on multiple CPU's)
but it's worth the memory.
Byron Miller wrote:
If you use the default nutch script i would set a
NUTCH_HEAPSIZE of 2000. That generally works for me
and i