[
https://issues.apache.org/jira/browse/SOLR-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13801752#comment-13801752
]
Simon Endele edited comment on SOLR-5375 at 10/22/13 12:08 PM:
---------------------------------------------------------------
It's not as easy as I thought in the first place as there's another issue that
bothers me and touches this one:
>From my expectation, fmap should only be applied to the values returned from
>Tika and not to literals. So currently it is not possible to declare the
>following mapping (assuming lowernames=true):
literal.content_type => schema field "content_type"
content_type from Tika => schema field "content_type_tika"
This is what the following request should do IMO:
literal.content_type=mytype&fmap.content_type=content_type_tika
Instead both values are stored to "content_type_tika".
The same problem exists for "lowernames". If enabled it is not possible to fill
schema fields containing upper-case letters using an ContentStreamUpdateRequest.
But this is a question of expected behavior and I'm afraid this would cause
backwards compatibility issues.
What do you think?
was (Author: simon.endele):
It's not as easy as I thought in the first place as there's another issue that
bothers me and touches this one:
>From my expectation, fmap should only be applied to the values returned from
>Tika and not to literals. So currently it is not possible to declare the
>following mapping (assuming lowernames=true):
literal.content_type => schema field "content_type"
content_type from Tika => schema field "content_type_tika"
what the following request should do IMO:
literal.content_type=mytype&fmap.content_type=content_type_tika
Instead both values are stored to "content_type_tika".
The same problem exists for "lowernames". If enabled it is not possible to fill
schema fields containing upper-case letters using an ContentStreamUpdateRequest.
But this is a question of expected behavior and I'm afraid this would cause
backwards compatibility issues.
What do you think?
> Param "literalsOverride" for ExtractingRequestHandler / SolrCell does not
> consider "lowernames"
> -----------------------------------------------------------------------------------------------
>
> Key: SOLR-5375
> URL: https://issues.apache.org/jira/browse/SOLR-5375
> Project: Solr
> Issue Type: Bug
> Components: contrib - Solr Cell (Tika extraction)
> Reporter: Simon Endele
> Priority: Minor
> Fix For: 4.6
>
>
> Can be reproduced with the following command and the example configuration
> shipped with Solr:
> cd exampledocs
> curl -F "[email protected]"
> "http://localhost:8983/solr/update/extract?commit=true&literal.id=myid&literalsOverride=true&lowernames=true&literal.content_type=mytype"
> The added doc contains both values:
> http://localhost:8983/solr/collection1/select?q=id%3Amyid&wt=xml&indent=true
> {code:xml}<arr name="content_type">
> <str>mytype</str>
> <str>application/xml</str>
> </arr>{code}
> If the corresponding field is not multi-valued, the request raises an
> org.apache.solr.common.SolrException: "ERROR: multiple values encountered for
> non multiValued field content_type: ...".
> Debugging the code (Solr 4.4.0) I found out that the parameter "lowernames"
> is not considered at several places in
> org.apache.solr.handler.extraction.SolrContentHandler looking like:
> {code}if (literalsOverride && literalFieldNames.contains(name))
> continue;
> {code}
> The same problem occurs for the following command (though its correctness
> could be discussed):
> curl -F "[email protected]"
> "http://localhost:8983/solr/update/extract?commit=true&literal.id=myid&literalsOverride=true&lowernames=false&fmap.Content-Type=content_type&literal.content_type=mytype"
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]