[ 
https://issues.apache.org/jira/browse/NUTCH-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated NUTCH-2418:
---------------------------------
    Description: 
{code}
2017-09-05 15:28:54,539 INFO [FetcherThread] 
org.apache.nutch.fetcher.FetcherThread: FetcherThread 38 fetch of 
https://www.provinciegroningen.nl/fileadmin/user_upload/Documenten/Downloads/vanturfvntoervfol.pdf
 failed with: java.lang.NullPointerException
        at org.apache.hadoop.io.Text.encode(Text.java:450)
        at org.apache.hadoop.io.Text.encode(Text.java:431)
        at org.apache.hadoop.io.Text.writeString(Text.java:480)
        at org.apache.nutch.parse.ParseData.write(ParseData.java:168)
        at org.apache.nutch.parse.ParseImpl.write(ParseImpl.java:69)
        at org.apache.hadoop.io.GenericWritable.write(GenericWritable.java:142)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1157)
        at 
org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:610)
        at org.apache.nutch.fetcher.FetcherThread.output(FetcherThread.java:773)
        at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:360)
{code}

Never seen it before, no idea what's going on. Opening issue to track it.

More found: lots of fetches of this website throw this NPE:

{code}
2017-09-25 13:55:08,103 INFO [FetcherThread] 
org.apache.nutch.fetcher.FetcherThread: FetcherThread 37 fetch of 
http://www.jabra.com.mx/c/fr/speak510-offert failed with: 
java.lang.NullPointerException
        at org.apache.hadoop.io.Text.encode(Text.java:450)
        at org.apache.hadoop.io.Text.encode(Text.java:431)
        at org.apache.hadoop.io.Text.writeString(Text.java:480)
        at org.apache.nutch.parse.ParseData.write(ParseData.java:168)
        at org.apache.nutch.parse.ParseImpl.write(ParseImpl.java:69)
        at org.apache.hadoop.io.GenericWritable.write(GenericWritable.java:142)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1157)
        at 
org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:610)
        at org.apache.nutch.fetcher.FetcherThread.output(FetcherThread.java:773)
        at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:360)
{code}

  was:
{code}
2017-09-05 15:28:54,539 INFO [FetcherThread] 
org.apache.nutch.fetcher.FetcherThread: FetcherThread 38 fetch of 
https://www.provinciegroningen.nl/fileadmin/user_upload/Documenten/Downloads/vanturfvntoervfol.pdf
 failed with: java.lang.NullPointerException
        at org.apache.hadoop.io.Text.encode(Text.java:450)
        at org.apache.hadoop.io.Text.encode(Text.java:431)
        at org.apache.hadoop.io.Text.writeString(Text.java:480)
        at org.apache.nutch.parse.ParseData.write(ParseData.java:168)
        at org.apache.nutch.parse.ParseImpl.write(ParseImpl.java:69)
        at org.apache.hadoop.io.GenericWritable.write(GenericWritable.java:142)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1157)
        at 
org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:610)
        at org.apache.nutch.fetcher.FetcherThread.output(FetcherThread.java:773)
        at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:360)
{code}

Never seen it before, no idea what's going on. Opening issue to track it.


> NPE in org.apache.hadoop.io.Text from FetcherThread
> ---------------------------------------------------
>
>                 Key: NUTCH-2418
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2418
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.13
>            Reporter: Markus Jelsma
>
> {code}
> 2017-09-05 15:28:54,539 INFO [FetcherThread] 
> org.apache.nutch.fetcher.FetcherThread: FetcherThread 38 fetch of 
> https://www.provinciegroningen.nl/fileadmin/user_upload/Documenten/Downloads/vanturfvntoervfol.pdf
>  failed with: java.lang.NullPointerException
>       at org.apache.hadoop.io.Text.encode(Text.java:450)
>       at org.apache.hadoop.io.Text.encode(Text.java:431)
>       at org.apache.hadoop.io.Text.writeString(Text.java:480)
>       at org.apache.nutch.parse.ParseData.write(ParseData.java:168)
>       at org.apache.nutch.parse.ParseImpl.write(ParseImpl.java:69)
>       at org.apache.hadoop.io.GenericWritable.write(GenericWritable.java:142)
>       at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98)
>       at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82)
>       at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1157)
>       at 
> org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:610)
>       at org.apache.nutch.fetcher.FetcherThread.output(FetcherThread.java:773)
>       at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:360)
> {code}
> Never seen it before, no idea what's going on. Opening issue to track it.
> More found: lots of fetches of this website throw this NPE:
> {code}
> 2017-09-25 13:55:08,103 INFO [FetcherThread] 
> org.apache.nutch.fetcher.FetcherThread: FetcherThread 37 fetch of 
> http://www.jabra.com.mx/c/fr/speak510-offert failed with: 
> java.lang.NullPointerException
>       at org.apache.hadoop.io.Text.encode(Text.java:450)
>       at org.apache.hadoop.io.Text.encode(Text.java:431)
>       at org.apache.hadoop.io.Text.writeString(Text.java:480)
>       at org.apache.nutch.parse.ParseData.write(ParseData.java:168)
>       at org.apache.nutch.parse.ParseImpl.write(ParseImpl.java:69)
>       at org.apache.hadoop.io.GenericWritable.write(GenericWritable.java:142)
>       at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98)
>       at 
> org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82)
>       at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1157)
>       at 
> org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:610)
>       at org.apache.nutch.fetcher.FetcherThread.output(FetcherThread.java:773)
>       at org.apache.nutch.fetcher.FetcherThread.run(FetcherThread.java:360)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to