[
https://issues.apache.org/jira/browse/TIKA-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018158#comment-14018158
]
Chris A. Mattmann commented on TIKA-1319:
-----------------------------------------
Hmm...I'm not so convinced this should be a separate module. When we started
Tika, the goal was for language detection to be part of the core and it is
there now - language translation seems to be close enough of that original
intent, I wonder why we wouldn't make it part of the core. I share
[[email protected]]'s concern of tying core to a web service, which is why I
suggested turning this into an interface.
I've got the following suggestion:
# Make the Translator interface part of tika-core and put in
org.apache.tika.language.translate
# Create a tika-translate module that includes the 1 specific implementation
linking to Microsoft's service
# Bubble up anything that seems generic across implementations in
tika-translate to the tika-core package
Thoughts?
> Translation
> -----------
>
> Key: TIKA-1319
> URL: https://issues.apache.org/jira/browse/TIKA-1319
> Project: Tika
> Issue Type: New Feature
> Reporter: Tyler Palsulich
> Assignee: Chris A. Mattmann
> Priority: Minor
>
> I just opened up a review on reviews.apache.org --
> https://reviews.apache.org/r/22219/. I copied the description below.
> This patch adds basic language translation functionality to Tika. Translation
> is provided by a Microsoft API, but accessed through Apache 2 licensed
> com.memetix.microsoft-translator-java-api
> (https://code.google.com/p/microsoft-translator-java-api/ ). If a user wants
> to use the translation feature, they have to add a client id and client
> secret to the
> tika-core/src/main/resources/org/apache/tika/language/translator.properties
> file (see http://msdn.microsoft.com/en-us/library/hh454950.aspx ). I added
> com.memetix as a dependency in tika-core. I put the Translator class in
> org.apache.tika.language. There is no integration with the server or CLI,
> yet. Further, only Strings are translated right now -- if you pass in a full
> document with xml tags, the structure will be mangled. But, I think that
> would be a cool feature -- translate the body, title, subtitle, etc, but not
> the structural elements.
> There is still more work to do, but I wanted some more eyes on this to make
> sure I'm heading in the right direction and this is a desired feature. Let me
> know what you think!
> There are two simple unit tests for now which translate "hello" to French
> ("salut"). One for inputting the source and target languages, one for
> inputing just the target language (and detecting the source language
> automatically).
--
This message was sent by Atlassian JIRA
(v6.2#6252)