Hi
Could you also send a call stack. It is not clear which component is out
of memory.
If it is the name-node, then you should check how many files, dirs, and
blocks there is by the time of failure.
If your crawl generates a lot of small files that could be the case.
Let us know.
--Konstantin
Uygar BAYAR wrote:
hi
we have 4 machine cluster. (dual core CPU 3.20GHz 2GB RAM 400GB disk).We
use nutch 0.9 and hadoop 0.13.1. We try to crawl web (60K site) 5 depth.
When we came 4th segment parse it gave java.lang.OutOfMemoryError:
Requested array size exceeds VM limit error each machine.. Our segment size
crawled/segments/20071002163239 3472754178
i try several map reduce configurations nothing change.. (400-50 ; 300-15
;50-15 ; 100-15; 200-35)
i also set heap size in hadoop-env and nutch script to 2000M