If you use the default nutch script i would set a
NUTCH_HEAPSIZE of 2000. That generally works for me
and i have over 100 million urls in db and generally
10 million urls per segment/index.

-byron

--- smith learner <[EMAIL PROTECTED]> wrote:
> Thanks for your reply. But I guess this solution
> doesn't work for me. Actually, I didn't use this
> parameter (I removed it from nutch script). 
> 
> BTW: My RAM is 4G. I use redhat. kernel is
> 2.4.20-31.9bigmem.
> 
>   Have you ever got the out of memory exception when
> you used nutch to crawl millions of website?
> 
> Regards,
> 
> Jack.
> 
> 
> --- cao yuzhong <[EMAIL PROTECTED]> wrote:
> > Changing the JVM parameter -Xmx may help you.
> > 
> > >From: smith learner <[EMAIL PROTECTED]>
> > >Reply-To: nutch-user@incubator.apache.org
> > >To: nutch-user@incubator.apache.org
> > >Subject: out of memory exception.
> > >Date: Fri, 22 Apr 2005 12:44:04 -0700 (PDT)
> > >
> > >I used nutch0.6 to crawl a million websites. When
> > it
> > >fetched around 2.5 million web pages, it always
> > throws
> > >out of memory exception. I catched the exception,
> > and
> > >tried to print out the stack trace, somehow, it
> > just
> > >print out
> > >
> > >Exception in thread "main"
> > java.lang.OutOfMemoryError:
> > >Java heap space
> > >
> > >the strange thing here is when I use 0.5 nutch
> > >version. it could fetch more than 7 million web
> > pages.
> > >
> > >I don't know what happens there? Can anybody shed
> > >light on this.
> > >
> > >Thanks in advance!
> > >
> > >Regards,
> > >
> > >smith.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >__________________________________
> > >Do you Yahoo!?
> > >Yahoo! Mail - You care about security. So do we.
> > >http://promotions.yahoo.com/new_mail
> > 
> > 
> > 
> > 
> >
>
-------------------------------------------------------
> > SF email is sponsored by - The IT Product Guide
> > Read honest & candid reviews on hundreds of IT
> > Products from real users.
> > Discover which products truly live up to the hype.
> > Start reading now.
> >
>
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> > _______________________________________________
> > Nutch-general mailing list
> > Nutch-general@lists.sourceforge.net
> >
>
https://lists.sourceforge.net/lists/listinfo/nutch-general
> > 
> 
> 
>               
> __________________________________ 
> Do you Yahoo!? 
> Yahoo! Mail - Helps protect you from nasty viruses. 
> http://promotions.yahoo.com/new_mail
> 
> 
>
-------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT
> Products from real users.
> Discover which products truly live up to the hype.
> Start reading now.
>
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Nutch-general mailing list
> Nutch-general@lists.sourceforge.net
>
https://lists.sourceforge.net/lists/listinfo/nutch-general
> 

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Reply via email to