hi
i found problem.. In the nutch-site.xml it parsed almost everything becouse
of this memory vm limit exceeds..
Uygar BAYAR wrote:
>
> hi
> we have 4 machine cluster. (dual core CPU 3.20GHz 2GB RAM 400GB disk).We
> use nutch 0.9 and hadoop 0.13.1. We try to crawl web (60K site) 5 depth.
>
hi
It's not a namenode, there is a single segment. Before parsing part fetch
reduce by 10 factor.
here is call stack and files to be parse sorry for long log
/user/nutch/sirketce/crawled/segments/20071002163239/content
/user/nutch/sirketce/crawled/segments/20071002163239/content/part-0
Hi
Could you also send a call stack. It is not clear which component is out
of memory.
If it is the name-node, then you should check how many files, dirs, and
blocks there is by the time of failure.
If your crawl generates a lot of small files that could be the case.
Let us know.
--Konstantin