[ 
https://issues.apache.org/jira/browse/NUTCH-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17778135#comment-17778135
 ] 

Hudson commented on NUTCH-3012:
-------------------------------

SUCCESS: Integrated in Jenkins build Nutch ยป Nutch-trunk #133 (See 
[https://ci-builds.apache.org/job/Nutch/job/Nutch-trunk/133/])
NUTCH-3012 SegmentReader when dumping with option -recode: NPE on unparsed 
documents (snagel: 
[https://github.com/apache/nutch/commit/d2c3e96d88818d8107f320c49e007329b020e090])
* (edit) src/java/org/apache/nutch/segment/SegmentReader.java


> SegmentReader when dumping with option -recode: NPE on unparsed documents
> -------------------------------------------------------------------------
>
>                 Key: NUTCH-3012
>                 URL: https://issues.apache.org/jira/browse/NUTCH-3012
>             Project: Nutch
>          Issue Type: Bug
>          Components: segment
>    Affects Versions: 1.19
>            Reporter: Sebastian Nagel
>            Assignee: Sebastian Nagel
>            Priority: Major
>             Fix For: 1.20
>
>
> SegmentReader when called with the flag {{-recode}} fails with a NPE when 
> trying to stringify the raw content of unparsed documents:
> {noformat}
> $> bin/nutch readseg  -dump crawl/segments/20231009065431 
> crawl/segreader/20231009065431 -recode
> ...
> 2023-10-09 07:55:18,451 INFO mapreduce.Job: Task Id : 
> attempt_1696825862783_0005_r_000000_0, Status : FAILED
> Error: java.lang.NullPointerException: charset
>         at java.base/java.lang.String.<init>(String.java:504)
>         at java.base/java.lang.String.<init>(String.java:561)
>         at org.apache.nutch.protocol.Content.toString(Content.java:297)
>         at 
> org.apache.nutch.segment.SegmentReader$InputCompatReducer.reduce(SegmentReader.java:189)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to