Nguyen Manh Tien created NUTCH-1673:
---------------------------------------
Summary: Title isn't reset in MoreIndexingFilter
Key: NUTCH-1673
URL: https://issues.apache.org/jira/browse/NUTCH-1673
Project: Nutch
Issue Type: Bug
Components: indexer
Affects Versions: 2.2.1
Reporter: Nguyen Manh Tien
In resetTitle function, title is added to doc. We need remove old title before
add. Currently it will resulted in error when indexing to solr when title field
is not multivalue field.
private NutchDocument resetTitle(NutchDocument doc, WebPage page, String url) {
...
for (int i = 0; i < patterns.length; i++) {
if (matcher.contains(contentDisposition.toString(), patterns[i])) {
...
doc.add("title", result.group(1));
break;
}
}
return doc;
}
--
This message was sent by Atlassian JIRA
(v6.1#6144)