Re: NPE in org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue

Dennis Kubes Wed, 14 Feb 2007 13:04:01 -0800

It may fix the problem it may not. There have been many changes inhadoop since 0.4. I think they are now on .11.x. So if you areupgrading existing dfs implementations that currently have content thatis something to take into consideration. That being said the changes inhadoop from .4 to present may very well have fixed the error you areseeing and to use the most recent version of hadoop you will need to usethe NUTCH-437 patch.

Looking at your output below though my first thought would be that thisis something in the PDF parser and not hadoop causing the error. Nutchuses pdfbox software to parse PDF files so you may want to take thespecific file and see if it parses correctly outside of nutch using pdfbox.


Dennis Kubes

Armel T. Nene wrote:

Dennis

I was wondering if this patch could fix my problem which is, if not the
same, very similar to this one. I am using Nutch 0.8.2-dev, I have made
checkout awhile ago from SVN but never updated again. I was able to crawl
10000 xml files before with no error whatsoever. This is the following
errors that I get when I'm fetching:

INFO parser.custom: Custom-parse: Parsing content
file:/C:/TeamBinder/AddressBook/9100/(65)E110_ST A0 (1).pdf
07/02/12 22:09:16 INFO fetcher.Fetcher: fetch of
file:/C:/TeamBinder/AddressBook/9100/(65)E110_ST A0 (1).pdf failed with:
java.lang.NullPointerException
07/02/12 22:09:17 INFO mapred.LocalJobRunner: 0 pages, 0 errors, 0.0

pages/s, 0 kb/s,07/02/12 22:09:17 FATAL fetcher.Fetcher: java.lang.NullPointerException

07/02/12 22:09:17 FATAL fetcher.Fetcher: at
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:198)
07/02/12 22:09:17 FATAL fetcher.Fetcher: at
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:189)
07/02/12 22:09:17 FATAL fetcher.Fetcher: at
org.apache.hadoop.mapred.MapTask$2.collect(MapTask.java:91)
07/02/12 22:09:17 FATAL fetcher.Fetcher: at
org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:314)
07/02/12 22:09:17 FATAL fetcher.Fetcher: at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:232)
07/02/12 22:09:17 FATAL fetcher.Fetcher: fetcher
caught:java.lang.NullPointerException

One of the problem is that my hadoop version says the following:
hadoop-0.4.0-patched. Now I don't know if it means that I am running the
0.4.0 version but it seems a little bit confusing. Once you can clarify that

for me, then I will be able to apply the patch to my version.

Best Regards,

Armel

-----Original Message-----

From: Dennis Kubes [mailto:[EMAIL PROTECTED]Sent: 13 February 2007 21:09

To: [email protected]
Subject: Re: NPE in org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue

Actually I take it back. I don't think it is the same problem but I dothink it is the right solution.


Dennis Kubes

Dennis Kubes wrote:

This has to do with HADOOP-964. Replace the jar files in your Nutchversions with the most recent versions from Hadoop. You will also needto apply NUTCH-437 patch to get Nutch to work with the most recentchanges to the Hadoop codebase.
Dennis Kubes

Gal Nitzan wrote:
Hi,

Does anybody uses Nutch trunk?

I am running nutch 0.9 and unable to fetch.

after 50-60K urls I get NPE in
org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue every time.
I was wandering if anyone have a work around or maybe something iswrong with
my setup.

I have opened a new issue in jira
http://issues.apache.org/jira/browse/hadoop-1008 for this.

Any clue?

Gal

Re: NPE in org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue

Reply via email to