[ https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729250#comment-14729250 ]
Ken Krugler commented on TIKA-1723: ----------------------------------- Regarding the current detection code... I'm going to propose that we leave it in tika-core, w/deprecation annotations, unless someone can come up with a good reason why we'd want to have it available via the new API. > Integrate language-detector into Tika > ------------------------------------- > > Key: TIKA-1723 > URL: https://issues.apache.org/jira/browse/TIKA-1723 > Project: Tika > Issue Type: Improvement > Components: languageidentifier > Affects Versions: 1.11 > Reporter: Ken Krugler > Assignee: Ken Krugler > Priority: Minor > Attachments: TIKA-1723-2.patch, TIKA-1723-3.patch, TIKA-1723.patch, > TIKA-1723v2.patch > > > The language-detector project at > https://github.com/optimaize/language-detector is faster, has more languages > (70 vs 13) and better accuracy than the built-in language detector. > This is a stab at integrating it, with some initial findings. There are a > number of issues this raises, especially if [~chrismattmann] moves forward > with turning language detection into a pluggable extension point. > I'll add comments with results below. -- This message was sent by Atlassian JIRA (v6.3.4#6332)