[jira] [Commented] (TIKA-1723) Integrate language-detector into Tika

Tim Allison (JIRA) Thu, 03 Sep 2015 10:57:06 -0700

    [ 
https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729468#comment-14729468
 ]


Tim Allison commented on TIKA-1723:
-----------------------------------

Makes sense.  I proposed moving it over just so that we didn't lose our 
investment in that code, but if Optimaize or another lang-detect package blows 
it out of the water, then it makes sense to abandon it.

Are you generally in agreement with the overall way ahead above (w exception of 
handing of legacy code)?

Should we remove legacy detection code in 2.0?  

> Integrate language-detector into Tika
> -------------------------------------
>
>                 Key: TIKA-1723
>                 URL: https://issues.apache.org/jira/browse/TIKA-1723
>             Project: Tika
>          Issue Type: Improvement
>          Components: languageidentifier
>    Affects Versions: 1.11
>            Reporter: Ken Krugler
>            Assignee: Ken Krugler
>            Priority: Minor
>         Attachments: TIKA-1723-2.patch, TIKA-1723-3.patch, TIKA-1723.patch, 
> TIKA-1723v2.patch
>
>
> The language-detector project at 
> https://github.com/optimaize/language-detector is faster, has more languages 
> (70 vs 13) and better accuracy than the built-in language detector.
> This is a stab at integrating it, with some initial findings. There are a 
> number of issues this raises, especially if [~chrismattmann] moves forward 
> with turning language detection into a pluggable extension point.
> I'll add comments with results below.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TIKA-1723) Integrate language-detector into Tika

Reply via email to