-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22761/#review76314
-----------------------------------------------------------



./trunk/tika-translate/pom.xml
<https://reviews.apache.org/r/22761/#comment123860>

    Hi Chris did you just build this locally?


- Lewis McGibbney


On June 18, 2014, 10:04 p.m., Chris Mattmann wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22761/
> -----------------------------------------------------------
> 
> (Updated June 18, 2014, 10:04 p.m.)
> 
> 
> Review request for tika.
> 
> 
> Bugs: tika-1343
>     https://issues.apache.org/jira/browse/tika-1343
> 
> 
> Repository: tika
> 
> 
> Description
> -------
> 
> The Joshua Decoder toolkit is a BSD licensed Java-based statistical machine 
> translation system hosted at Github:
> 
> http://joshua-decoder.org/
> 
> Joshua takes in corpuses and trains models that can then be used to do 
> language translation. Currently there is support for e.g., Spanisn->English, 
> Indian dialects->English, Chinese->English, and a few others.
> 
> https://github.com/joshua-decoder/joshua/
> 
> It would be nice to build a Tika Translator on top of Joshua. There are of 
> course several issues with this:
> 
> * the models are huge - so we'll need a separate package or Maven module, 
> maybe tika-translate-joshua or something to release the models and we'll need 
> to build the models. I just went through the process of building the 
> Spanish->English one, and it still needs to be rebuilt b/c I did it wrong, 
> but it took over a day
> * there is a configuration for Joshua, and so we need some way of passing 
> that config into the Translator. Not sure of the best way to do this.
> * Joshua isn't in the Central repository. I've started a discussion on the 
> Joshua lists about this: 
> https://groups.google.com/forum/#!topic/joshua_support/9Y04miboUj0
> 
> Anyhoo, I've got a working patch right now with hard code stuff, and a manual 
> install into my Maven repo for brave souls out there that want to try it.
> 
> 
> Diffs
> -----
> 
>   ./trunk/tika-translate/pom.xml 1603529 
>   
> ./trunk/tika-translate/src/main/java/org/apache/tika/language/translate/JoshuaTranslator.java
>  PRE-CREATION 
>   
> ./trunk/tika-translate/src/test/java/org/apache/tika/language/translate/JoshuaTranslatorTest.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22761/diff/
> 
> 
> Testing
> -------
> 
> ran through on my locally built Spanish->English corpus built using 
> http://joshua-decoder.org/data/fisher-callhome-corpus/
> My dataset isn't perfect, but it can do basic translations. Also wrote a unit 
> test, part of the patch.
> 
> 
> Thanks,
> 
> Chris Mattmann
> 
>

Reply via email to