Ken Krugler created TIKA-1723:
---------------------------------
Summary: Integrate language-detector into Tika
Key: TIKA-1723
URL: https://issues.apache.org/jira/browse/TIKA-1723
Project: Tika
Issue Type: Improvement
Affects Versions: 1.11
Reporter: Ken Krugler
Assignee: Ken Krugler
Priority: Minor
The language-detector project at https://github.com/optimaize/language-detector
is faster, has more languages (70 vs 13) and better accuracy than the built-in
language detector.
This is a stab at integrating it, with some initial findings. There are a
number of issues this raises, especially if [~chrismattmann] moves forward with
turning language detection into a pluggable extension point.
I'll add comments with results below.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)