[
https://issues.apache.org/jira/browse/TIKA-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190104#comment-15190104
]
Paul Ramirez commented on TIKA-1696:
------------------------------------
Trevor has a patch to make this work with Tika 1.11. He mentioned that he
posted the patch but I'm not seeing it here I'll hit him up as it may just be
that he posted that in his GitHub repo.
> Language Identification with Text Processing Toolkit from MITLL
> ---------------------------------------------------------------
>
> Key: TIKA-1696
> URL: https://issues.apache.org/jira/browse/TIKA-1696
> Project: Tika
> Issue Type: New Feature
> Components: languageidentifier
> Reporter: Paul Ramirez
> Assignee: Chris A. Mattmann
> Fix For: 1.13
>
>
> The aim here is to extend the methods for language identification within
> text. MIT Lincoln Labs has an open source library [1] written in Julia.
> Having spoken with the MITLL guys there is a possibility that there is a
> scala version of this library which would make it easier to package in with
> Tika.
> At this point I'm not quite sure how many languages this library supports by
> default but it can be extended when provided some training data.
> [1] https://github.com/mit-nlp/Text.jl
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)