-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22219/
-----------------------------------------------------------

(Updated June 4, 2014, 7:11 p.m.)


Review request for tika and Chris Mattmann.


Changes
-------

Thanks for the input! I updated the patch. Translator is now an interface. 
MicrosoftTranslator is an implementation of it. I added another method to the 
interface -- isAvailable. This is useful for users who will not use a 
translation service, but still want all of the unit tests to pass. Let me know 
if there is anything else I should update!

Tyler


Repository: tika


Description
-------

This patch adds basic language translation functionality to Tika. Translation 
is provided by a Microsoft API, but accessed through Apache 2 licensed 
com.memetix.microsoft-translator-java-api 
(https://code.google.com/p/microsoft-translator-java-api/ ). If a user wants to 
use the translation feature, they have to add a client id and client secret to 
the tika-core/src/main/resources/org/apache/tika/language/translator.properties 
file (see http://msdn.microsoft.com/en-us/library/hh454950.aspx ). I added 
com.memetix as a dependency in tika-core. I put the Translator class in 
org.apache.tika.language. There is no integration with the server or CLI, yet. 
Further, only Strings are translated right now -- if you pass in a full 
document with xml tags, the structure will be mangled. But, I think that would 
be a cool feature -- translate the body, title, subtitle, etc, but not the 
structural elements. 

There is still more work to do, but I wanted some more eyes on this to make 
sure I'm heading in the right direction and this is a desired feature. Let me 
know what you think!


Diffs (updated)
-----

  trunk/tika-core/pom.xml 1600418 
  trunk/tika-core/src/main/java/org/apache/tika/Tika.java 1600418 
  
trunk/tika-core/src/main/java/org/apache/tika/language/MicrosoftTranslator.java 
PRE-CREATION 
  trunk/tika-core/src/main/java/org/apache/tika/language/Translator.java 
PRE-CREATION 
  
trunk/tika-core/src/test/java/org/apache/tika/language/MicrosoftTranslatorTest.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/22219/diff/


Testing
-------

There are two simple unit tests for now which translate "hello" to French 
("salut"). One for inputting the source and target languages, one for inputing 
just the target language (and detecting the source language automatically).


Thanks,

Tyler Palsulich

Reply via email to