----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22219/ -----------------------------------------------------------
(Updated June 5, 2014, 4:19 p.m.) Review request for tika and Chris Mattmann. Changes ------- I updated the patch (off of r1600565). I created the tika-translate module. The Translator interface is still in tika-core. MicrosoftTranslator (in tika-translate) is the only implementation of the Translator interface. tika-core/Tika uses SWI to load the Translators from the META-INF/services/org.apache.tika.language.translate.Translator file in tika-translate, so tika-core does not depend on tika-translate. A notable result of this is, I added a Translator field to TikaConfig -- so users can specify a translator in a DOM, get a DefaultTranslator, etc. I updated some of the JavaDoc, too. Repository: tika Description ------- This patch adds basic language translation functionality to Tika. Translation is provided by a Microsoft API, but accessed through Apache 2 licensed com.memetix.microsoft-translator-java-api (https://code.google.com/p/microsoft-translator-java-api/ ). If a user wants to use the translation feature, they have to add a client id and client secret to the tika-core/src/main/resources/org/apache/tika/language/translator.properties file (see http://msdn.microsoft.com/en-us/library/hh454950.aspx ). I added com.memetix as a dependency in tika-core. I put the Translator class in org.apache.tika.language. There is no integration with the server or CLI, yet. Further, only Strings are translated right now -- if you pass in a full document with xml tags, the structure will be mangled. But, I think that would be a cool feature -- translate the body, title, subtitle, etc, but not the structural elements. There is still more work to do, but I wanted some more eyes on this to make sure I'm heading in the right direction and this is a desired feature. Let me know what you think! Diffs (updated) ----- trunk/pom.xml 1600565 trunk/tika-core/src/main/java/org/apache/tika/Tika.java 1600565 trunk/tika-core/src/main/java/org/apache/tika/config/TikaConfig.java 1600565 trunk/tika-core/src/main/java/org/apache/tika/language/translate/DefaultTranslator.java PRE-CREATION trunk/tika-core/src/main/java/org/apache/tika/language/translate/Translator.java PRE-CREATION trunk/tika-translate/pom.xml PRE-CREATION trunk/tika-translate/src/main/java/org/apache/tika/language/translate/MicrosoftTranslator.java PRE-CREATION trunk/tika-translate/src/main/resources/META-INF/services/org.apache.tika.language.translate.Translator PRE-CREATION trunk/tika-translate/src/main/resources/org/apache/tika/language/translator.microsoft.properties PRE-CREATION trunk/tika-translate/src/test/java/org/apache/tika/language/translate/MicrosoftTranslatorTest.java PRE-CREATION Diff: https://reviews.apache.org/r/22219/diff/ Testing ------- There are two simple unit tests for now which translate "hello" to French ("salut"). One for inputting the source and target languages, one for inputing just the target language (and detecting the source language automatically). Thanks, Tyler Palsulich
