so I've been getting this error
"multiple_values_encountered_for_non_multiValued_field_title" every once
in a while when I am trying to run solrindex. I can now say that this is
being caused by index-more plug in (MoreIndexingFilter.java)
private NutchDocument resetTitle(NutchDocument doc, ParseData data,
String url) {
String contentDisposition =
data.getMeta(Metadata.CONTENT_DISPOSITION);
if (contentDisposition == null)
return doc;
for (int i=0; i<patterns.length; i++) {
Matcher matcher = patterns[i].matcher(contentDisposition);
if (matcher.find()) {
doc.add("title", matcher.group(1));
break;
}
}
return doc;
}
the problem here is that in my case this function is not reseting but it
is just adding a new title. it seems that the original idea was that if
CONTENT_DISPOSITION exist then the document will not have a title set
from other plug ins (namely index-basic). unfortunately this seems not
to be always the case as you can see by running this command:
bin/nutch indexchecker http://www.2modern.com/site/gift-registry.html
what i do get (the part that is relevant) is:
tstamp : Tue Feb 21 13:18:13 PST 2012
type : text/html
type : text
type : html
date : Tue Feb 21 13:18:13 PST 2012
url : http://www.2modern.com/site/gift-registry.html
content : 2Modern Gift Registry Modern Furniture & Lighting items
in cart 0 checkout Returning 2Modern cu
user_ranking : 25.0
title : 2Modern Gift Registry
title : gift-registry.html
plutoz_ranking : 10.0
categories : Furniture Home
contentLength : 12924
and as you can see there are 2 titles. I think it would be very easy to
fix that. just check to see if a title exist already before setting the
name of the file as title:
if (contentDisposition == null || null != doc.getField("title"))
return doc;
or if the substitution must happen in presence of CONTENT_DISPOSITION,
at least remove the old one:
if (matcher.find()) {
doc.remove("title");
doc.add("title", matcher.group(1));
break;
}
now that being said, the real problem here is why NutchDocument
doesn't observe the schema.xml file and alway assumes that all fields
are multi value?
public void add(String name, Object value) {
53 NutchField field = fields.get(name);
54 if (field == null) {
55 field = new NutchField(value);
56 fields.put(name, field);
57 } else {
58 ----> field.add(value); <---
59 }
60 }
--
Kaveh Minooie
www.plutoz.com