[jira] [Comment Edited] (SOLR-5375) Param "literalsOverride" for ExtractingRequestHandler / SolrCell does not consider "lowernames"

Simon Endele (JIRA) Tue, 22 Oct 2013 05:09:10 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-5375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13801752#comment-13801752
 ]


Simon Endele edited comment on SOLR-5375 at 10/22/13 12:08 PM:
---------------------------------------------------------------

It's not as easy as I thought in the first place as there's another issue that 
bothers me and touches this one:
>From my expectation, fmap should only be applied to the values returned from 
>Tika and not to literals. So currently it is not possible to declare the 
>following mapping (assuming lowernames=true):
literal.content_type => schema field "content_type"
content_type from Tika => schema field "content_type_tika"

This is what the following request should do IMO: 
literal.content_type=mytype&fmap.content_type=content_type_tika
Instead both values are stored to "content_type_tika".

The same problem exists for "lowernames". If enabled it is not possible to fill 
schema fields containing upper-case letters using an ContentStreamUpdateRequest.

But this is a question of expected behavior and I'm afraid this would cause 
backwards compatibility issues.
What do you think?


was (Author: simon.endele):
It's not as easy as I thought in the first place as there's another issue that 
bothers me and touches this one:
>From my expectation, fmap should only be applied to the values returned from 
>Tika and not to literals. So currently it is not possible to declare the 
>following mapping (assuming lowernames=true):
literal.content_type => schema field "content_type"
content_type from Tika => schema field "content_type_tika"
what the following request should do IMO: 
literal.content_type=mytype&fmap.content_type=content_type_tika
Instead both values are stored to "content_type_tika".

The same problem exists for "lowernames". If enabled it is not possible to fill 
schema fields containing upper-case letters using an ContentStreamUpdateRequest.

But this is a question of expected behavior and I'm afraid this would cause 
backwards compatibility issues.
What do you think?

> Param "literalsOverride" for ExtractingRequestHandler / SolrCell does not 
> consider "lowernames"
> -----------------------------------------------------------------------------------------------
>
>                 Key: SOLR-5375
>                 URL: https://issues.apache.org/jira/browse/SOLR-5375
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - Solr Cell (Tika extraction)
>            Reporter: Simon Endele
>            Priority: Minor
>             Fix For: 4.6
>
>
> Can be reproduced with the following command and the example configuration 
> shipped with Solr:
> cd exampledocs
> curl -F "[email protected]" 
> "http://localhost:8983/solr/update/extract?commit=true&literal.id=myid&literalsOverride=true&lowernames=true&literal.content_type=mytype";
> The added doc contains both values:
> http://localhost:8983/solr/collection1/select?q=id%3Amyid&wt=xml&indent=true
> {code:xml}<arr name="content_type">
>     <str>mytype</str>
>     <str>application/xml</str>
> </arr>{code}
> If the corresponding field is not multi-valued, the request raises an 
> org.apache.solr.common.SolrException: "ERROR: multiple values encountered for 
> non multiValued field content_type: ...".
> Debugging the code (Solr 4.4.0) I found out that the parameter "lowernames" 
> is not considered at several places in 
> org.apache.solr.handler.extraction.SolrContentHandler looking like:
> {code}if (literalsOverride && literalFieldNames.contains(name))
>         continue;
> {code}
> The same problem occurs for the following command (though its correctness 
> could be discussed):
> curl -F "[email protected]" 
> "http://localhost:8983/solr/update/extract?commit=true&literal.id=myid&literalsOverride=true&lowernames=false&fmap.Content-Type=content_type&literal.content_type=mytype";



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SOLR-5375) Param "literalsOverride" for ExtractingRequestHandler / SolrCell does not consider "lowernames"

Reply via email to