[ 
https://issues.apache.org/jira/browse/SOLR-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128409#comment-13128409
 ] 

Robert Muir commented on SOLR-2839:
-----------------------------------

{quote}
How does this impl compare with the Tika one for short texts? And wouldn't it 
make more sense to add this on the Tika level letting the detection method be 
configurable? Then all Tika users would benefit from it.
{quote}

I have no idea, probably not that great? But i didnt compare to tika.
regarding short texts: 
http://shuyo.wordpress.com/2011/09/29/langdetect-is-updatedadded-profiles-of-estonian-lithuanian-latvian-slovene-and-so-on/

{quote}
And wouldn't it make more sense to add this on the Tika level letting the 
detection method be configurable? Then all Tika users would benefit from it.
{quote}

If someone wants to do this, then we can remove this implementation at that 
time. But for lucene/solr, I am able to commit to this project, and I think 
that its important for langid to be pluggable to different implementations.

For example, maybe someone ports google's detector 
(http://src.chromium.org/viewvc/chrome/trunk/src/third_party/cld/) to java and 
we expose that too, which might be interesting for short texts.

                
> add alternative language detection impl
> ---------------------------------------
>
>                 Key: SOLR-2839
>                 URL: https://issues.apache.org/jira/browse/SOLR-2839
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.5, 4.0
>
>         Attachments: SOLR-2839.patch
>
>
> based on http://code.google.com/p/language-detection (apache license), 
> supports 53 languages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to