Marco Remy created SOLR-13356:
---------------------------------

             Summary: Language detection per value
                 Key: SOLR-13356
                 URL: https://issues.apache.org/jira/browse/SOLR-13356
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
          Components: contrib - LangId
            Reporter: Marco Remy


Hello,

We are using the _LangDetect_ language detection processor with individual 
field mapping.
{code:xml}
<processor class="solr.LangDetectLanguageIdentifierUpdateProcessorFactory">
  ...
  <bool name="langid.map">true</bool>
  <bool name="langid.map.individual">true</bool>
</processor>
{code}
If a (simple structured) document is indexed containing different languages in 
a +multivalued field+, only one language will be predicted.

eg:
{code:xml}
<doc>
  <field>This is any text</field>
  <field>Das ist irgendein Text</field>
</doc>
{code}
The result will be either {{field_en}} or {{field_de}} and both values are 
mapped into that localized field. In effect some values won't be analyzed 
properly according to their actual language.

As enhancement, the detection should be available per value on multivalued 
fields. So their values of can be mapped individually.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to