Re: nutch crawl command takes 98% of cpu

2011-03-14 Thread alxsss
Hello, Which version this patch is applicable? Thanks. Alex. -Original Message- From: Alexis alexis.detregl...@gmail.com To: user user@nutch.apache.org Sent: Tue, Feb 8, 2011 9:59 am Subject: Re: nutch crawl command takes 98% of cpu Hi, Thanks for all the feedback

Re: nutch crawl command takes 98% of cpu

2011-02-08 Thread Alexis
Hi, Thanks for all the feedback. It looks like there is not much you can do if you give the FLV parser some corrupted data. From a practical point of view, we can say that this is extremely annoying as it takes up all the CPU resources and prevent other threads to perform their task properly,

Re: nutch crawl command takes 98% of cpu

2011-02-07 Thread Ken Krugler
Hi Kirby others, On Jan 31, 2011, at 4:39pm, Kirby Bohling wrote: On Sat, Jan 29, 2011 at 9:03 AM, Ken Krugler kkrugler_li...@transpac.com wrote: Some comments below. On Jan 29, 2011, at 5:55am, Julien Nioche wrote: Hi, This shows the state of the various threads within a Java process.

Re: nutch crawl command takes 98% of cpu

2011-02-01 Thread Andrzej Bialecki
On 2/1/11 1:39 AM, Kirby Bohling wrote: On Sat, Jan 29, 2011 at 9:03 AM, Ken Krugler kkrugler_li...@transpac.com wrote: Some comments below. On Jan 29, 2011, at 5:55am, Julien Nioche wrote: Hi, This shows the state of the various threads within a Java process. Most of them seem to be busy

Re: nutch crawl command takes 98% of cpu

2011-01-31 Thread Kirby Bohling
On Sat, Jan 29, 2011 at 9:03 AM, Ken Krugler kkrugler_li...@transpac.com wrote: Some comments below. On Jan 29, 2011, at 5:55am, Julien Nioche wrote: Hi, This shows the state of the various threads within a Java process. Most of them seem to be busy parsing zip archives with Tika. The

RE: nutch crawl command takes 98% of cpu

2011-01-27 Thread Chris Woolum
If you are looking at the tasktracker control panel, what does it show? The link is http://localhost:50030 -Original Message- From: alx...@aim.com [mailto:alx...@aim.com] Sent: Thursday, January 27, 2011 3:01 PM To: user@nutch.apache.org Subject: nutch crawl command takes 98% of cpu

Re: nutch crawl command takes 98% of cpu

2011-01-27 Thread Alexis
Hi, I ran into the same issue as well with Nutch 1.2. You could fix it by upgrading the version of tika parser to at least 0.8. The lib can be found in the plugins/parse-tika/ directory of your Nutch release. This has already been mentioned twice in the mailing-list: See