Re: [Nutch-general] NullPointerException fetching some sites with temp redirects
I'll try those if I get a chance. (BTW Remuneration is misspelled on absoluteit.co.nz if you care) --Kai M. - Original Message From: Carl Cerecke [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, July 26, 2007 4:21:07 PM Subject: Re: NullPointerException fetching some sites with temp redirects Is anybody else getting NullPointerExceptions fetching either of these two sites (0.90 and latest from trunk) ? http://www.absoluteit.co.nz http://defence.allmedia.co.nz I am, but would be grateful if someone else could test whether they work or not so I can eliminate nutch configuration issues. Cheers, Carl. Carl Cerecke wrote: Hi, Using nutch 0.9, although I get the same with a more recent nightly build. I'm getting NPE fetching these two pages: http://www.absoluteit.co.nz and http://defence.allmedia.co.nz I've tracked it down by putting a t.printStackTrace() in the catch (Throwable t) of the run() in Fetcher.java: java.lang.NullPointerException at org.apache.hadoop.io.Text.encode(Text.java:375) at org.apache.hadoop.io.Text.encode(Text.java:356) at org.apache.hadoop.io.Text.writeString(Text.java:396) at org.apache.nutch.protocol.Content.writeCompressed(Content.java:146) at org.apache.hadoop.io.CompressedWritable.write(CompressedWritable.java:74) at org.apache.nutch.fetcher.FetcherOutput.write(FetcherOutput.java:56) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315) at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:343) at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:191) I'm not sure where to go from here. Any suggestions? Cheers, Carl. _ This has been cleaned processed by www.rocketspam.co.nz _ Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s user panel and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/___ Nutch-general mailing list Nutch-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-general
Re: [Nutch-general] NullPointerException fetching some sites with temp redirects
Is anybody else getting NullPointerExceptions fetching either of these two sites (0.90 and latest from trunk) ? http://www.absoluteit.co.nz http://defence.allmedia.co.nz I am, but would be grateful if someone else could test whether they work or not so I can eliminate nutch configuration issues. Cheers, Carl. Carl Cerecke wrote: Hi, Using nutch 0.9, although I get the same with a more recent nightly build. I'm getting NPE fetching these two pages: http://www.absoluteit.co.nz and http://defence.allmedia.co.nz I've tracked it down by putting a t.printStackTrace() in the catch (Throwable t) of the run() in Fetcher.java: java.lang.NullPointerException at org.apache.hadoop.io.Text.encode(Text.java:375) at org.apache.hadoop.io.Text.encode(Text.java:356) at org.apache.hadoop.io.Text.writeString(Text.java:396) at org.apache.nutch.protocol.Content.writeCompressed(Content.java:146) at org.apache.hadoop.io.CompressedWritable.write(CompressedWritable.java:74) at org.apache.nutch.fetcher.FetcherOutput.write(FetcherOutput.java:56) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315) at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:343) at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:191) I'm not sure where to go from here. Any suggestions? Cheers, Carl. _ This has been cleaned processed by www.rocketspam.co.nz _ - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nutch-general mailing list Nutch-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-general
Re: [Nutch-general] NullPointerException fetching some sites with temp redirects
Hi, On 7/25/07, Carl Cerecke [EMAIL PROTECTED] wrote: Hi, Using nutch 0.9, although I get the same with a more recent nightly build. I'm getting NPE fetching these two pages: http://www.absoluteit.co.nz and http://defence.allmedia.co.nz I've tracked it down by putting a t.printStackTrace() in the catch (Throwable t) of the run() in Fetcher.java: java.lang.NullPointerException at org.apache.hadoop.io.Text.encode(Text.java:375) at org.apache.hadoop.io.Text.encode(Text.java:356) at org.apache.hadoop.io.Text.writeString(Text.java:396) at org.apache.nutch.protocol.Content.writeCompressed(Content.java:146) at org.apache.hadoop.io.CompressedWritable.write(CompressedWritable.java:74) at org.apache.nutch.fetcher.FetcherOutput.write(FetcherOutput.java:56) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315) at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:343) at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:191) I'm not sure where to go from here. Any suggestions? Can you retry with the latest trunk? Not that I think it will solve your problem but Content.java has changed recently so I am not sure what was in line 146. So, if problem reoccurs with latest trunk I can check exactly which line is failing. Alternatively, you can send that part of Content.java's code. Cheers, Carl. -- Doğacan Güney - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nutch-general mailing list Nutch-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-general
Re: [Nutch-general] NullPointerException fetching some sites with temp redirects
Hi Doğacan, Yes, I get the NullPointerException with the latest trunk, too. Cheers, Carl. Doğacan Güney wrote: Hi, On 7/25/07, Carl Cerecke [EMAIL PROTECTED] wrote: Hi, Using nutch 0.9, although I get the same with a more recent nightly build. I'm getting NPE fetching these two pages: http://www.absoluteit.co.nz and http://defence.allmedia.co.nz I've tracked it down by putting a t.printStackTrace() in the catch (Throwable t) of the run() in Fetcher.java: java.lang.NullPointerException at org.apache.hadoop.io.Text.encode(Text.java:375) at org.apache.hadoop.io.Text.encode(Text.java:356) at org.apache.hadoop.io.Text.writeString(Text.java:396) at org.apache.nutch.protocol.Content.writeCompressed(Content.java:146) at org.apache.hadoop.io.CompressedWritable.write(CompressedWritable.java:74) at org.apache.nutch.fetcher.FetcherOutput.write(FetcherOutput.java:56) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315) at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:343) at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:191) I'm not sure where to go from here. Any suggestions? Can you retry with the latest trunk? Not that I think it will solve your problem but Content.java has changed recently so I am not sure what was in line 146. So, if problem reoccurs with latest trunk I can check exactly which line is failing. Alternatively, you can send that part of Content.java's code. Cheers, Carl. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nutch-general mailing list Nutch-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-general
Re: [Nutch-general] NullPointerException fetching some sites with temp redirects
Hi, Included Content.java. Will retry with latest trunk shortly. Content.java:137-149 137 protected final void writeCompressed(DataOutput out) throws IOException { 138out.writeByte(VERSION); 139 140Text.writeString(out, url); // write url 141Text.writeString(out, base); // write base 142 143out.writeInt(content.length); // write content 144out.write(content); 145 146Text.writeString(out, contentType); // write contentType 147 148metadata.write(out); // write metadata 149 } I also noticed in the output.collect call in Fetcher.java a new FetcherOutput is created with the third argument (ParseImpl) as null even though the Content argument is not null (it is the contents of the page that is redirected to). Cheers, Carl. Doğacan Güney wrote: Hi, On 7/25/07, Carl Cerecke [EMAIL PROTECTED] wrote: Hi, Using nutch 0.9, although I get the same with a more recent nightly build. I'm getting NPE fetching these two pages: http://www.absoluteit.co.nz and http://defence.allmedia.co.nz I've tracked it down by putting a t.printStackTrace() in the catch (Throwable t) of the run() in Fetcher.java: java.lang.NullPointerException at org.apache.hadoop.io.Text.encode(Text.java:375) at org.apache.hadoop.io.Text.encode(Text.java:356) at org.apache.hadoop.io.Text.writeString(Text.java:396) at org.apache.nutch.protocol.Content.writeCompressed(Content.java:146) at org.apache.hadoop.io.CompressedWritable.write(CompressedWritable.java:74) at org.apache.nutch.fetcher.FetcherOutput.write(FetcherOutput.java:56) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315) at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:343) at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:191) I'm not sure where to go from here. Any suggestions? Can you retry with the latest trunk? Not that I think it will solve your problem but Content.java has changed recently so I am not sure what was in line 146. So, if problem reoccurs with latest trunk I can check exactly which line is failing. Alternatively, you can send that part of Content.java's code. Cheers, Carl. - This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now http://get.splunk.com/ ___ Nutch-general mailing list Nutch-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-general