[ 
https://issues.apache.org/jira/browse/NUTCH-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16434531#comment-16434531
 ] 

ASF GitHub Bot commented on NUTCH-2551:
---------------------------------------

HansBrende opened a new pull request #316: fix for NUTCH-2551 contributed by 
Hans Brende
URL: https://github.com/apache/nutch/pull/316
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> NullPointerException in generator
> ---------------------------------
>
>                 Key: NUTCH-2551
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2551
>             Project: Nutch
>          Issue Type: Bug
>          Components: generator
>    Affects Versions: 1.15
>            Reporter: Hans Brende
>            Priority: Blocker
>             Fix For: 1.15
>
>
> A NullPointerException is thrown during the crawl generate stage when I 
> deploy to a hadoop cluster (but for some reason, it works fine locally).
> It looks like this is caused because the URLPartitioner class still has the 
> old {{configure()}} method in there (which is never called, causing the 
> {{normalizers}} field to remain null), rather than implementing the 
> {{Configurable}} interface as detailed in the newer mapreduce API's 
> Partitioner spec.
> Stack trace:
> {code}
> java.lang.NullPointerException
>  at org.apache.nutch.crawl.URLPartitioner.getPartition(URLPartitioner.java:76)
>  at org.apache.nutch.crawl.URLPartitioner.getPartition(URLPartitioner.java:40)
>  at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:716)
>  at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>  at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>  at 
> org.apache.nutch.crawl.Generator$SelectorInverseMapper.map(Generator.java:553)
>  at 
> org.apache.nutch.crawl.Generator$SelectorInverseMapper.map(Generator.java:546)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
> {code}
>  
> Oh and it might also be because a *static* URLPartitioner instance is being 
> used in the Generator.Selector class... but it's only initialized in the 
> {{setup()}} method of the Generator.Selector.SelectorMapper class! So that 
> whole setup looks pretty weird...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to