[
https://issues.apache.org/jira/browse/SOLR-11774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16400111#comment-16400111
]
Jan Høydahl commented on SOLR-11774:
------------------------------------
See [PR 336|https://github.com/apache/lucene-solr/pull/336] for failing test.
My plan for fixing this is:
* Change
{{protected abstract List<DetectedLanguage> detectLanguage(SolrInputDocument
content); }}to
{{protected abstract List<DetectedLanguage> detectLanguage(Reader content);}}
* New method in {{LanguageIdentifierUpdateProcessor}}
{{protected Reader solrDocReader(SolrInputDocument doc, String[] fields)}}
This will replace {{concatFields()}} and retrieve just enough field data to
satisfy the reader
* To detect language for one field only, return a reader for one field only
{{detectLanguage(solrDocReader(doc, fieldName))}}
* The implementations become simpler, and the default LangDetectLIURP can take
advantage of using the {{public void append(Reader reader)}} method
This is a breaking API change, but since the class is still tagged as
{{@lucene.experimental}} we are allowed to do that, not?
> langid.map.individual won't work with langid.map.keepOrig
> ---------------------------------------------------------
>
> Key: SOLR-11774
> URL: https://issues.apache.org/jira/browse/SOLR-11774
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: contrib - LangId
> Affects Versions: 5.0
> Reporter: Marco Remy
> Assignee: Jan Høydahl
> Priority: Minor
> Fix For: 6.6.4, 7.4, master (8.0)
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Tried to get language detection to work.
> *Setting:*
> {code:xml}
> <processor
> class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
> <str name="langid.fl">title,author</str>
> <str name="langid.langsField">detected_languages</str>
> <str name="langid.whitelist">de,en</str>
> <str name="langid.fallback">txt</str>
> <bool name="langid.map">true</bool>
> <bool name="langid.map.individual">true</bool>
> <bool name="langid.map.keepOrig">true</bool>
> </processor>
> {code}
> Main purpose
> * Map fields individually
> * Keep the original field
> But the fields won't be mapped individually. They are mapped to a single
> detected language. After some hours of investigation i finally found the
> reason: *The option langid.map.keepOrig breaks the individual mapping
> function.* Only if it is disabled the fields will be mapped as expected.
> - Regards
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]