Re: [Nutch-general] NullPointerException fetching some sites with temp redirects

2007-07-26 Thread Kai_testing Middleton
I'll try those if I get a chance.  (BTW Remuneration is misspelled on 
absoluteit.co.nz if you care)
--Kai M.

- Original Message 
From: Carl Cerecke [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, July 26, 2007 4:21:07 PM
Subject: Re: NullPointerException fetching some sites with temp redirects

Is anybody else getting NullPointerExceptions fetching either of these 
two sites (0.90 and latest from trunk) ?

http://www.absoluteit.co.nz
http://defence.allmedia.co.nz

I am, but would be grateful if someone else could test whether they work 
or not so I can eliminate nutch configuration issues.

Cheers,
Carl.

Carl Cerecke wrote:
 Hi,
 
 Using nutch 0.9, although I get the same with a more recent nightly build.
 
 I'm getting NPE fetching these two pages:
 
 http://www.absoluteit.co.nz
 and
 http://defence.allmedia.co.nz
 
 I've tracked it down by putting a t.printStackTrace() in the catch 
 (Throwable t) of the run() in Fetcher.java:
 java.lang.NullPointerException
 at org.apache.hadoop.io.Text.encode(Text.java:375)
 at org.apache.hadoop.io.Text.encode(Text.java:356)
 at org.apache.hadoop.io.Text.writeString(Text.java:396)
 at 
 org.apache.nutch.protocol.Content.writeCompressed(Content.java:146)
 at 
 org.apache.hadoop.io.CompressedWritable.write(CompressedWritable.java:74)
 at 
 org.apache.nutch.fetcher.FetcherOutput.write(FetcherOutput.java:56)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315)
 at 
 org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:343)
 at 
 org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:191)
 
 I'm not sure where to go from here. Any suggestions?
 
 Cheers,
 Carl.
 
 _
 
 This has been cleaned  processed by www.rocketspam.co.nz
 _
 








  

Fussy? Opinionated? Impossible to please? Perfect.  Join Yahoo!'s user panel 
and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 
-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/___
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general


Re: [Nutch-general] NullPointerException fetching some sites with temp redirects

2007-07-26 Thread Carl Cerecke
Is anybody else getting NullPointerExceptions fetching either of these 
two sites (0.90 and latest from trunk) ?

http://www.absoluteit.co.nz
http://defence.allmedia.co.nz

I am, but would be grateful if someone else could test whether they work 
or not so I can eliminate nutch configuration issues.

Cheers,
Carl.

Carl Cerecke wrote:
 Hi,
 
 Using nutch 0.9, although I get the same with a more recent nightly build.
 
 I'm getting NPE fetching these two pages:
 
 http://www.absoluteit.co.nz
 and
 http://defence.allmedia.co.nz
 
 I've tracked it down by putting a t.printStackTrace() in the catch 
 (Throwable t) of the run() in Fetcher.java:
 java.lang.NullPointerException
 at org.apache.hadoop.io.Text.encode(Text.java:375)
 at org.apache.hadoop.io.Text.encode(Text.java:356)
 at org.apache.hadoop.io.Text.writeString(Text.java:396)
 at 
 org.apache.nutch.protocol.Content.writeCompressed(Content.java:146)
 at 
 org.apache.hadoop.io.CompressedWritable.write(CompressedWritable.java:74)
 at 
 org.apache.nutch.fetcher.FetcherOutput.write(FetcherOutput.java:56)
 at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315)
 at 
 org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:343)
 at 
 org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:191)
 
 I'm not sure where to go from here. Any suggestions?
 
 Cheers,
 Carl.
 
 _
 
 This has been cleaned  processed by www.rocketspam.co.nz
 _
 


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general


Re: [Nutch-general] NullPointerException fetching some sites with temp redirects

2007-07-25 Thread Doğacan Güney
Hi,

On 7/25/07, Carl Cerecke [EMAIL PROTECTED] wrote:
 Hi,

 Using nutch 0.9, although I get the same with a more recent nightly build.

 I'm getting NPE fetching these two pages:

 http://www.absoluteit.co.nz
 and
 http://defence.allmedia.co.nz

 I've tracked it down by putting a t.printStackTrace() in the catch
 (Throwable t) of the run() in Fetcher.java:
 java.lang.NullPointerException
  at org.apache.hadoop.io.Text.encode(Text.java:375)
  at org.apache.hadoop.io.Text.encode(Text.java:356)
  at org.apache.hadoop.io.Text.writeString(Text.java:396)
  at
 org.apache.nutch.protocol.Content.writeCompressed(Content.java:146)
  at
 org.apache.hadoop.io.CompressedWritable.write(CompressedWritable.java:74)
  at
 org.apache.nutch.fetcher.FetcherOutput.write(FetcherOutput.java:56)
  at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315)
  at
 org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:343)
  at
 org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:191)

 I'm not sure where to go from here. Any suggestions?

Can you retry with the latest trunk?  Not that I think it will solve
your problem but Content.java has changed recently so I am not sure
what was in line 146. So, if problem reoccurs with latest trunk I can
check exactly which line is failing. Alternatively, you can send that
part of Content.java's code.


 Cheers,
 Carl.



-- 
Doğacan Güney
-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general


Re: [Nutch-general] NullPointerException fetching some sites with temp redirects

2007-07-25 Thread Carl Cerecke
Hi Doğacan,

Yes, I get the NullPointerException with the latest trunk, too.

Cheers,
Carl.

Doğacan Güney wrote:
 Hi,
 
 On 7/25/07, Carl Cerecke [EMAIL PROTECTED] wrote:
 Hi,

 Using nutch 0.9, although I get the same with a more recent nightly 
 build.

 I'm getting NPE fetching these two pages:

 http://www.absoluteit.co.nz
 and
 http://defence.allmedia.co.nz

 I've tracked it down by putting a t.printStackTrace() in the catch
 (Throwable t) of the run() in Fetcher.java:
 java.lang.NullPointerException
  at org.apache.hadoop.io.Text.encode(Text.java:375)
  at org.apache.hadoop.io.Text.encode(Text.java:356)
  at org.apache.hadoop.io.Text.writeString(Text.java:396)
  at
 org.apache.nutch.protocol.Content.writeCompressed(Content.java:146)
  at
 org.apache.hadoop.io.CompressedWritable.write(CompressedWritable.java:74)
  at
 org.apache.nutch.fetcher.FetcherOutput.write(FetcherOutput.java:56)
  at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315) 

  at
 org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:343)
  at
 org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:191)

 I'm not sure where to go from here. Any suggestions?
 
 Can you retry with the latest trunk?  Not that I think it will solve
 your problem but Content.java has changed recently so I am not sure
 what was in line 146. So, if problem reoccurs with latest trunk I can
 check exactly which line is failing. Alternatively, you can send that
 part of Content.java's code.
 

 Cheers,
 Carl.

 
 


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general


Re: [Nutch-general] NullPointerException fetching some sites with temp redirects

2007-07-25 Thread Carl Cerecke
Hi, Included Content.java. Will retry with latest trunk shortly.

Content.java:137-149

137 protected final void writeCompressed(DataOutput out) throws 
IOException {
138out.writeByte(VERSION);
139
140Text.writeString(out, url); // write url
141Text.writeString(out, base); // write base
142
143out.writeInt(content.length); // write content
144out.write(content);
145
146Text.writeString(out, contentType); // write contentType
147
148metadata.write(out); // write metadata
149  }


I also noticed in the output.collect call in Fetcher.java a new 
FetcherOutput is created with the third argument (ParseImpl) as null 
even though the Content argument is not null (it is the contents of the 
page that is redirected to).

Cheers,
Carl.

Doğacan Güney wrote:
 Hi,
 
 On 7/25/07, Carl Cerecke [EMAIL PROTECTED] wrote:
 Hi,

 Using nutch 0.9, although I get the same with a more recent nightly 
 build.

 I'm getting NPE fetching these two pages:

 http://www.absoluteit.co.nz
 and
 http://defence.allmedia.co.nz

 I've tracked it down by putting a t.printStackTrace() in the catch
 (Throwable t) of the run() in Fetcher.java:
 java.lang.NullPointerException
  at org.apache.hadoop.io.Text.encode(Text.java:375)
  at org.apache.hadoop.io.Text.encode(Text.java:356)
  at org.apache.hadoop.io.Text.writeString(Text.java:396)
  at
 org.apache.nutch.protocol.Content.writeCompressed(Content.java:146)
  at
 org.apache.hadoop.io.CompressedWritable.write(CompressedWritable.java:74)
  at
 org.apache.nutch.fetcher.FetcherOutput.write(FetcherOutput.java:56)
  at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315) 

  at
 org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:343)
  at
 org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:191)

 I'm not sure where to go from here. Any suggestions?
 
 Can you retry with the latest trunk?  Not that I think it will solve
 your problem but Content.java has changed recently so I am not sure
 what was in line 146. So, if problem reoccurs with latest trunk I can
 check exactly which line is failing. Alternatively, you can send that
 part of Content.java's code.
 

 Cheers,
 Carl.

 
 


-
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now   http://get.splunk.com/
___
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general