[ 
https://issues.apache.org/jira/browse/NUTCH-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511429#comment-16511429
 ] 

ASF GitHub Bot commented on NUTCH-2597:
---------------------------------------

sebastian-nagel commented on a change in pull request #349: NUTCH-2597: fixed 
cleanup()
URL: https://github.com/apache/nutch/pull/349#discussion_r195162047
 
 

 ##########
 File path: src/java/org/apache/nutch/indexer/CleaningJob.java
 ##########
 @@ -64,10 +64,12 @@ public void setConf(Configuration conf) {
       Mapper<Text, CrawlDatum, ByteWritable, Text> {
     private ByteWritable OUT = new ByteWritable(CrawlDatum.STATUS_DB_GONE);
 
+    @Override
     public void setup(Mapper<Text, CrawlDatum, ByteWritable, Text>.Context 
context) {
     }
 
-    public void cleanup() throws IOException {
+    @Override
+    public void cleanup(Context context) throws IOException {
 
 Review comment:
   Could also remove the method implementation. The superclass Reducer already 
implements already a do-nothing cleanup(context).
   
   If you have time: there are a couple of other cleanup() methods without the 
context argument. Probably same mistake but harmless as they "do nothing". Need 
to check in detail but `git grep -A2 'cleanup()'` finds a couple of them. 
Thanks, @sju!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> NPE in updatehostdb
> -------------------
>
>                 Key: NUTCH-2597
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2597
>             Project: Nutch
>          Issue Type: Bug
>          Components: hostdb
>    Affects Versions: 1.15
>            Reporter: Jurian Broertjes
>            Priority: Critical
>
> I get an NPE on updatehostdb. I start with a clean crawlDB & hostDB. After an 
> inject, I do an updatehostdb with -checkAll and get the following stacktrace:
> {code}
> 2018-06-13 10:45:21,958 WARN hostdb.ResolverThread - 
> java.lang.NullPointerException
>  at 
> org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1359)
>  at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1400)
>  at 
> org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:83)
>  at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558)
>  at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>  at 
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
>  at org.apache.nutch.hostdb.ResolverThread.run(ResolverThread.java:82)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> {code}
> Is this related to NUTCH-2375?
> If further testing is needed, please let me know!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to