I think I found a bug --> multiple_values_encountered_for_non_multiValued_field_title

kaveh minooie Tue, 21 Feb 2012 16:49:10 -0800

so I've been getting this error"multiple_values_encountered_for_non_multiValued_field_title" every oncein a while when I am trying to run solrindex. I can now say that this isbeing caused by index-more plug in (MoreIndexingFilter.java)

private NutchDocument resetTitle(NutchDocument doc, ParseData data,String url) {

            String contentDisposition = 
data.getMeta(Metadata.CONTENT_DISPOSITION);
            if (contentDisposition == null)
              return doc;
        
            for (int i=0; i<patterns.length; i++) {
              Matcher matcher = patterns[i].matcher(contentDisposition);
              if (matcher.find()) {
                doc.add("title", matcher.group(1));
                break;
              }
            }
           return doc;
          }

the problem here is that in my case this function is not reseting but itis just adding a new title. it seems that the original idea was that ifCONTENT_DISPOSITION exist then the document will not have a title setfrom other plug ins (namely index-basic). unfortunately this seems notto be always the case as you can see by running this command:


bin/nutch indexchecker http://www.2modern.com/site/gift-registry.html

what i do get (the part that is relevant) is:

        
tstamp :        Tue Feb 21 13:18:13 PST 2012
type :  text/html
type :  text
type :  html
date :  Tue Feb 21 13:18:13 PST 2012
url :   http://www.2modern.com/site/gift-registry.html

content : 2Modern Gift Registry Modern Furniture & Lighting itemsin cart 0 checkout Returning 2Modern cu

user_ranking :  25.0
title : 2Modern Gift Registry
title : gift-registry.html
plutoz_ranking :        10.0
categories :    Furniture Home
contentLength : 12924

and as you can see there are 2 titles. I think it would be very easy tofix that. just check to see if a title exist already before setting thename of the file as title:


if (contentDisposition == null || null != doc.getField("title"))
              return doc;

or if the substitution must happen in presence of CONTENT_DISPOSITION,at least remove the old one:


if (matcher.find()) {
        doc.remove("title");
        doc.add("title", matcher.group(1));
        break;
 }

now that being said, the real problem here is why NutchDocumentdoesn't observe the schema.xml file and alway assumes that all fieldsare multi value?


public void add(String name, Object value) {
53          NutchField field = fields.get(name);
54          if (field == null) {
55            field = new NutchField(value);
56            fields.put(name, field);
57          } else {
58      ----> field.add(value);  <---
59          }
60        }

--
Kaveh Minooie

www.plutoz.com

I think I found a bug --> multiple_values_encountered_for_non_multiValued_field_title

Reply via email to