[jira] [Commented] (NUTCH-2551) NullPointerException in generator

Omkar Reddy (JIRA) Mon, 09 Apr 2018 23:50:34 -0700

    [ 
https://issues.apache.org/jira/browse/NUTCH-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16431798#comment-16431798
 ]


Omkar Reddy commented on NUTCH-2551:
------------------------------------

I think the issue here is that a new job is(job.getInstance) being created in 
the setup() of GeneratorSelectorMapper and that job is being passed when we are 
configuring the partitioner. This might be the reason for the configuration 
being lost and hence the nullPointerException.

I don't know why I created a new job in that patch(NUTCH-2375) rather than just 
passing the configuration object to URLPartitioner.configure() method, my bad. 
This is a quick fix and I will send a PR. Thanks. 

> NullPointerException in generator
> ---------------------------------
>
>                 Key: NUTCH-2551
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2551
>             Project: Nutch
>          Issue Type: Bug
>          Components: generator
>    Affects Versions: 1.15
>            Reporter: Hans Brende
>            Priority: Blocker
>             Fix For: 1.15
>
>
> A NullPointerException is thrown during the crawl generate stage when I 
> deploy to a hadoop cluster (but for some reason, it works fine locally).
> It looks like this is caused because the URLPartitioner class still has the 
> old {{configure()}} method in there (which is never called, causing the 
> {{normalizers}} field to remain null), rather than implementing the 
> {{Configurable}} interface as detailed in the newer mapreduce API's 
> Partitioner spec.
> Stack trace:
> {code}
> java.lang.NullPointerException
>  at org.apache.nutch.crawl.URLPartitioner.getPartition(URLPartitioner.java:76)
>  at org.apache.nutch.crawl.URLPartitioner.getPartition(URLPartitioner.java:40)
>  at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:716)
>  at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>  at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>  at 
> org.apache.nutch.crawl.Generator$SelectorInverseMapper.map(Generator.java:553)
>  at 
> org.apache.nutch.crawl.Generator$SelectorInverseMapper.map(Generator.java:546)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
> {code}
>  
> Oh and it might also be because a *static* URLPartitioner instance is being 
> used in the Generator.Selector class... but it's only initialized in the 
> {{setup()}} method of the Generator.Selector.SelectorMapper class! So that 
> whole setup looks pretty weird...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (NUTCH-2551) NullPointerException in generator

Reply via email to