Couple of points:

1. You used tabs
2. You left some unneccessary comments on source, bug history is allready in jira and commit logs
3. Why not addition to testcase?
4. Issue could have been iterated in jira a bit further so all these could have been catched before a commit.

--
 Sami Siren




Chris A. Mattmann (JIRA) wrote:
     [ http://issues.apache.org/jira/browse/NUTCH-406?page=all ]

Chris A. Mattmann closed NUTCH-406.
-----------------------------------


Patch applied to trunk:

http://svn.apache.org/viewvc?view=rev&revision=478619




Metadata tries to write null values
-----------------------------------

                Key: NUTCH-406
                URL: http://issues.apache.org/jira/browse/NUTCH-406
            Project: Nutch
         Issue Type: Bug
   Affects Versions: 0.9.0
           Reporter: Doğacan Güney
        Assigned To: Chris A. Mattmann
            Fix For: 0.9.0

        Attachments: NUTCH-406.patch, NUTCH-406.patch


During parsing, some urls (especially pdfs, it seems) may create <some_key, null> pairs in ParseData's parseMeta. When Metadata.write() tries to write such a pair, it causes an NPE.
Stack trace will be something like this:
        at org.apache.hadoop.io.Text.encode(Text.java:373)
        at org.apache.hadoop.io.Text.encode(Text.java:354)
        at org.apache.hadoop.io.Text.writeString(Text.java:394)
        at org.apache.nutch.metadata.Metadata.write(Metadata.java:214)
I can consistently reproduce this using the following url:
http://www.efesbev.com/corporate_governance/pdf/MergerAgreement.pdf


Reply via email to