[
https://issues.apache.org/jira/browse/SOLR-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marco Remy updated SOLR-13356:
------------------------------
Description:
Hello,
We are using the _LangDetect_ language detection processor with individual
field mapping.
{code:xml}
<processor class="solr.LangDetectLanguageIdentifierUpdateProcessorFactory">
...
<bool name="langid.map">true</bool>
<bool name="langid.map.individual">true</bool>
</processor>
{code}
If a (simple structured) document is indexed containing different languages in
a +multivalued field+, only one language will be predicted.
eg:
{code:xml}
<doc>
<field>This is any text</field>
<field>Das ist irgendein Text</field>
</doc>
{code}
The result will be either {{field_en}} or {{field_de}} and both values are
mapped into that localized field. In effect some values won't be analyzed
properly according to their actual language.
As enhancement, the detection should be available per value on multivalued
fields. So their values can be mapped individually.
was:
Hello,
We are using the _LangDetect_ language detection processor with individual
field mapping.
{code:xml}
<processor class="solr.LangDetectLanguageIdentifierUpdateProcessorFactory">
...
<bool name="langid.map">true</bool>
<bool name="langid.map.individual">true</bool>
</processor>
{code}
If a (simple structured) document is indexed containing different languages in
a +multivalued field+, only one language will be predicted.
eg:
{code:xml}
<doc>
<field>This is any text</field>
<field>Das ist irgendein Text</field>
</doc>
{code}
The result will be either {{field_en}} or {{field_de}} and both values are
mapped into that localized field. In effect some values won't be analyzed
properly according to their actual language.
As enhancement, the detection should be available per value on multivalued
fields. So their values of can be mapped individually.
> Language detection per value
> ----------------------------
>
> Key: SOLR-13356
> URL: https://issues.apache.org/jira/browse/SOLR-13356
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: contrib - LangId
> Reporter: Marco Remy
> Priority: Minor
> Labels: UpdateProcessor, detection, language
>
> Hello,
> We are using the _LangDetect_ language detection processor with individual
> field mapping.
> {code:xml}
> <processor class="solr.LangDetectLanguageIdentifierUpdateProcessorFactory">
> ...
> <bool name="langid.map">true</bool>
> <bool name="langid.map.individual">true</bool>
> </processor>
> {code}
> If a (simple structured) document is indexed containing different languages
> in a +multivalued field+, only one language will be predicted.
> eg:
> {code:xml}
> <doc>
> <field>This is any text</field>
> <field>Das ist irgendein Text</field>
> </doc>
> {code}
> The result will be either {{field_en}} or {{field_de}} and both values are
> mapped into that localized field. In effect some values won't be analyzed
> properly according to their actual language.
> As enhancement, the detection should be available per value on multivalued
> fields. So their values can be mapped individually.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]