[ 
https://issues.apache.org/jira/browse/NUTCH-2375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313503#comment-16313503
 ] 

ASF GitHub Bot commented on NUTCH-2375:
---------------------------------------

Omkar20895 commented on issue #221: NUTCH-2375 Upgrading nutch to use 
org.apache.hadoop.mapreduce
URL: https://github.com/apache/nutch/pull/221#issuecomment-355616195
 
 
   Hello, Apologies for not updating the branch I was on a vacation. I have 
updated the PR and now I am getting a NullPointerException and I can't quite 
get my head around it. 
   
   The Exception comes when I run the command : runtime/deploy/bin/nutch 
generate crawl/crawldb crawl/segments
   
   The exception : 
   Error: java.lang.NullPointerException
        at 
org.apache.nutch.crawl.URLPartitioner.getPartition(URLPartitioner.java:75)
        at 
org.apache.nutch.crawl.URLPartitioner.getPartition(URLPartitioner.java:39)
        at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:715)
        at 
org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
        at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
        at 
org.apache.nutch.crawl.Generator$SelectorInverseMapper.map(Generator.java:538)
        at 
org.apache.nutch.crawl.Generator$SelectorInverseMapper.map(Generator.java:531)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
   
   The exception occurs only in pseudo-distributed mode, the code looks fine 
and I do not understand the reason for this exception. I think this has 
something to do with execution in distributed mode of hadoop. @sebastian-nagel 
@lewismc please suggest me some reasons of why this might be happening so that 
I can work in it. Thanks. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Upgrade the code base from org.apache.hadoop.mapred to 
> org.apache.hadoop.mapreduce
> ----------------------------------------------------------------------------------
>
>                 Key: NUTCH-2375
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2375
>             Project: Nutch
>          Issue Type: Improvement
>          Components: deployment
>    Affects Versions: 1.13
>            Reporter: Omkar Reddy
>             Fix For: 1.15
>
>
> Nutch is still using the deprecated org.apache.hadoop.mapred dependency which 
> has been deprecated. It need to be updated to org.apache.hadoop.mapreduce 
> dependency. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to