Ken Krugler created TIKA-1723:
---------------------------------

             Summary: Integrate language-detector into Tika
                 Key: TIKA-1723
                 URL: https://issues.apache.org/jira/browse/TIKA-1723
             Project: Tika
          Issue Type: Improvement
    Affects Versions: 1.11
            Reporter: Ken Krugler
            Assignee: Ken Krugler
            Priority: Minor


The language-detector project at https://github.com/optimaize/language-detector 
is faster, has more languages (70 vs 13) and better accuracy than the built-in 
language detector.

This is a stab at integrating it, with some initial findings. There are a 
number of issues this raises, especially if [~chrismattmann] moves forward with 
turning language detection into a pluggable extension point.

I'll add comments with results below.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to