[
https://issues.apache.org/jira/browse/TIKA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14729613#comment-14729613
]
Tim Allison commented on TIKA-1723:
-----------------------------------
Great. Thank you.
bq. 1. ...Doesn't that get into issues with the classloader, etc? In any case,
I assume that's something Chris A. Mattmann will address in a separate issue,
re making the language detection pluggable.
Y, and y. It'll be possible, but it'll take some work in a separate issue.
bq. 4. I think so, though there's a philosophical issue here...should we just
have one built-in implementation, and assume that any others will be separate
plug-ins implemented by somebody else?
Once we go the route of plugability, we may as well add a wrapper for
[cybozu's|http://mvnrepository.com/artifact/com.cybozu.labs/langdetect/1.1-20120112]
in the tika-lang-detect module...I think. We could cut down on some
configuration in the Solr config with more configuration on our side. :)
Wait... But seriously, I think we should add it, eventually.
> Integrate language-detector into Tika
> -------------------------------------
>
> Key: TIKA-1723
> URL: https://issues.apache.org/jira/browse/TIKA-1723
> Project: Tika
> Issue Type: Improvement
> Components: languageidentifier
> Affects Versions: 1.11
> Reporter: Ken Krugler
> Assignee: Ken Krugler
> Priority: Minor
> Attachments: TIKA-1723-2.patch, TIKA-1723-3.patch, TIKA-1723.patch,
> TIKA-1723v2.patch
>
>
> The language-detector project at
> https://github.com/optimaize/language-detector is faster, has more languages
> (70 vs 13) and better accuracy than the built-in language detector.
> This is a stab at integrating it, with some initial findings. There are a
> number of issues this raises, especially if [~chrismattmann] moves forward
> with turning language detection into a pluggable extension point.
> I'll add comments with results below.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)