[ https://issues.apache.org/jira/browse/SOLR-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128409#comment-13128409 ]
Robert Muir commented on SOLR-2839: ----------------------------------- {quote} How does this impl compare with the Tika one for short texts? And wouldn't it make more sense to add this on the Tika level letting the detection method be configurable? Then all Tika users would benefit from it. {quote} I have no idea, probably not that great? But i didnt compare to tika. regarding short texts: http://shuyo.wordpress.com/2011/09/29/langdetect-is-updatedadded-profiles-of-estonian-lithuanian-latvian-slovene-and-so-on/ {quote} And wouldn't it make more sense to add this on the Tika level letting the detection method be configurable? Then all Tika users would benefit from it. {quote} If someone wants to do this, then we can remove this implementation at that time. But for lucene/solr, I am able to commit to this project, and I think that its important for langid to be pluggable to different implementations. For example, maybe someone ports google's detector (http://src.chromium.org/viewvc/chrome/trunk/src/third_party/cld/) to java and we expose that too, which might be interesting for short texts. > add alternative language detection impl > --------------------------------------- > > Key: SOLR-2839 > URL: https://issues.apache.org/jira/browse/SOLR-2839 > Project: Solr > Issue Type: Improvement > Reporter: Robert Muir > Assignee: Robert Muir > Fix For: 3.5, 4.0 > > Attachments: SOLR-2839.patch > > > based on http://code.google.com/p/language-detection (apache license), > supports 53 languages. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org