[ 
https://issues.apache.org/jira/browse/SOLR-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128407#comment-13128407
 ] 

Jan Høydahl commented on SOLR-2839:
-----------------------------------

Cool. The reasoning behind a list of detected languages was that a more 
advanced detector could go sentence by sentence and tag multi lingual documents 
correctly. FAST had that capability.

How does this impl compare with the Tika one for short texts? And wouldn't it 
make more sense to add this on the Tika level letting the detection method be 
configurable? Then all Tika users would benefit from it.
                
> add alternative language detection impl
> ---------------------------------------
>
>                 Key: SOLR-2839
>                 URL: https://issues.apache.org/jira/browse/SOLR-2839
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.5, 4.0
>
>         Attachments: SOLR-2839.patch
>
>
> based on http://code.google.com/p/language-detection (apache license), 
> supports 53 languages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to