Metadata tries to write null values ----------------------------------- Key: NUTCH-406 URL: http://issues.apache.org/jira/browse/NUTCH-406 Project: Nutch Issue Type: Bug Affects Versions: 0.9.0 Reporter: Doğacan Güney
During parsing, some urls (especially pdfs, it seems) may create <some_key, null> pairs in ParseData's parseMeta. When Metadata.write() tries to write such a pair, it causes an NPE. Stack trace will be something like this: at org.apache.hadoop.io.Text.encode(Text.java:373) at org.apache.hadoop.io.Text.encode(Text.java:354) at org.apache.hadoop.io.Text.writeString(Text.java:394) at org.apache.nutch.metadata.Metadata.write(Metadata.java:214) I can consistently reproduce this using the following url: http://www.efesbev.com/corporate_governance/pdf/MergerAgreement.pdf -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira