[
https://issues.apache.org/jira/browse/SOLR-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eric Pugh resolved SOLR-5362.
-----------------------------
Resolution: Won't Fix
In Solr 10 we are leveraging either Tika Server (running in it's own seperate
server process) or maybe Tika Pipes (again, running in a seperate JVM).
Please revalidate your issue against Solr 10 with one of those options, and if
it is still present need, happy to work with you on a fix using the new
approach for Tika.
> SolrCell's order of field operation with lowernames=true
> --------------------------------------------------------
>
> Key: SOLR-5362
> URL: https://issues.apache.org/jira/browse/SOLR-5362
> Project: Solr
> Issue Type: Improvement
> Components: contrib - Solr Cell (Tika extraction)
> Reporter: Chaiyasit (Sit) Manovit
> Priority: Major
>
> This follows from SOLR-1634.
> I am not sure if SOLR-1856 completely fixes SOLR-1634, particularly when
> {{lowernames=true}} comes in to the picture. Consider a case where:
> 1. Tika generated field {{Category=Foo}} for a doc (e.g., this comes from
> user-defined document properties).
> 2. {{literalsOverride=true}}.
> 3. {{lowernames=true}}.
> 4. User supplied {{literal.category=bar}}.
> According to the
> [rules|http://wiki.apache.org/solr/ExtractingRequestHandler#Order_of_field_operations],
> {{literalsOverride}} is applied before {{lowernames}} and, thus, will have
> no effect here since the field {{Category}} from Tika and
> {{literal.category}} are considered different fields at this stage before
> {{lowernames=true}} kicks in. And when {{lowernames=true}} kicks in, it has
> the effect of merging {{Category}} into {{category}}, giving it both values
> {{Foo}} and {{bar}}.
> Adding {{fmap.Category=tika_category}} does not help because {{fmap}} is
> applied even later, by that time {{category}} already contains both {{Foo}}
> and {{bar}}.
> Adding {{fmap.Category=tika_category}} *and* with {{lowernames=false}} would
> do (regardless of {{literalsOverride}}), but what if we need
> {{lowernames=true}} and what if the capitalization of {{Category}} can vary
> (e.g., {{CATEGORY}}).
> Would it make sense to have an option to apply the rules in the order that
> they are specified in the config file or URL params rather than always in a
> static order?
> Thanks.
> PS. Marking this as Major because there seems to be no easy workaround
> (condition for Minor).
> ------------------------
> Response from Jan Høydahl
> ([link|https://issues.apache.org/jira/browse/SOLR-1634?focusedCommentId=13797273&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13797273]):
> bq. To me it sounds like a potential, very simple solution would be to apply
> lowercasing at several places if {{lowernames=true}}
> Agreed. Particularly, to apply {{lowernames=true}} as soon as Tika has
> extracted a field, before {{literalsOverride}} is even considered.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]