ExtractingRequestHandler applies fname settings to literals
-----------------------------------------------------------
Key: SOLR-3386
URL: https://issues.apache.org/jira/browse/SOLR-3386
Project: Solr
Issue Type: Bug
Components: contrib - Solr Cell (Tika extraction)
Affects Versions: 3.5
Reporter: Colin Hebert
Priority: Minor
The SolrContentHandler.addLiterals() method call the
SolrContentHandler.addField() which itself obtain the field with
SolrContentHandler.findMappedName().
If this call makes sense with SolrContentHandler.addMetadata() [and others]
because the user can't set the name of the fields otherwise, but with literals,
the name of the field is manually given by the user so it shouldn't be changed
at all (maybe applying unknownFieldPrefix or defaultField could be done, but
even that doesn't seem quite normal).
----
I got this issue with the following usecase:
I have a schema containing a "title" field which is mandatory and contains only
one value.
My documents have an internal title which is used as the value of the "title"
field.
When sending one of these documents (and HTML document), if it contains a
"title" metadata I get an exception because I have multiple values for my
"title" field (as I would expect).
To fix that I used "fname.title=tika_title", so the title provided by tika is
kept under another name.
Both titles (the original one I pass manually, and the metadata one) are now
named "tika_title" and I get an exception because "title" hasn't been provided
at all.
----
An easy workaround for this bug is sending the literal as "my_title", and
adding the following fnames "fname.my_title=title&fname.title=tika_title". A
small swicheroo which puts back the correct value in the expected field.
----
A way to fix that is extracting the first blocks of
SolrContentHandler.addField() in an external method (or put the lowerNames
check in SolrContentHandler.findMappedName() ) and use that external method (or
findMappedName() ) _before_ calling SolrContentHandler.addField()
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]