[ 
https://issues.apache.org/jira/browse/SOLR-11774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16400111#comment-16400111
 ] 

Jan Høydahl commented on SOLR-11774:
------------------------------------

See [PR 336|https://github.com/apache/lucene-solr/pull/336] for failing test.

My plan for fixing this is:
 * Change 
{{protected abstract List<DetectedLanguage> detectLanguage(SolrInputDocument 
content); }}to 
{{protected abstract List<DetectedLanguage> detectLanguage(Reader content);}}
 * New method in {{LanguageIdentifierUpdateProcessor}} 
{{protected Reader solrDocReader(SolrInputDocument doc, String[] fields)}}
This will replace {{concatFields()}} and retrieve just enough field data to 
satisfy the reader
 * To detect language for one field only, return a reader for one field only
{{detectLanguage(solrDocReader(doc, fieldName))}}
 * The implementations become simpler, and the default LangDetectLIURP can take 
advantage of using the {{public void append(Reader reader)}} method

This is a breaking API change, but since the class is still tagged as 
{{@lucene.experimental}} we are allowed to do that, not?

> langid.map.individual won't work with langid.map.keepOrig
> ---------------------------------------------------------
>
>                 Key: SOLR-11774
>                 URL: https://issues.apache.org/jira/browse/SOLR-11774
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: contrib - LangId
>    Affects Versions: 5.0
>            Reporter: Marco Remy
>            Assignee: Jan Høydahl
>            Priority: Minor
>             Fix For: 6.6.4, 7.4, master (8.0)
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Tried to get language detection to work.
> *Setting:*
> {code:xml}
> <processor 
> class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
>       <str name="langid.fl">title,author</str>
>       <str name="langid.langsField">detected_languages</str>
>       <str name="langid.whitelist">de,en</str>
>       <str name="langid.fallback">txt</str>
>       <bool name="langid.map">true</bool>
>       <bool name="langid.map.individual">true</bool>
>       <bool name="langid.map.keepOrig">true</bool>
>     </processor>
> {code}
> Main purpose
> * Map fields individually
> * Keep the original field
> But the fields won't be mapped individually. They are mapped to a single 
> detected language. After some hours of investigation i finally found the 
> reason: *The option langid.map.keepOrig breaks the individual mapping 
> function.* Only if it is disabled the fields will be mapped as expected.
> - Regards



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to