[ http://issues.apache.org/jira/browse/NUTCH-406?page=all ]
Chris A. Mattmann updated NUTCH-406: ------------------------------------ Assignee: Chris A. Mattmann > Metadata tries to write null values > ----------------------------------- > > Key: NUTCH-406 > URL: http://issues.apache.org/jira/browse/NUTCH-406 > Project: Nutch > Issue Type: Bug > Affects Versions: 0.9.0 > Reporter: Doğacan Güney > Assigned To: Chris A. Mattmann > Attachments: NUTCH-406.patch > > > During parsing, some urls (especially pdfs, it seems) may create <some_key, > null> pairs in ParseData's parseMeta. > When Metadata.write() tries to write such a pair, it causes an NPE. > Stack trace will be something like this: > at org.apache.hadoop.io.Text.encode(Text.java:373) > at org.apache.hadoop.io.Text.encode(Text.java:354) > at org.apache.hadoop.io.Text.writeString(Text.java:394) > at org.apache.nutch.metadata.Metadata.write(Metadata.java:214) > I can consistently reproduce this using the following url: > http://www.efesbev.com/corporate_governance/pdf/MergerAgreement.pdf -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira