[ 
https://issues.apache.org/jira/browse/TIKA-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15604274#comment-15604274
 ] 

ASF GitHub Bot commented on TIKA-1343:
--------------------------------------

GitHub user lewismc opened a pull request:

    https://github.com/apache/tika/pull/137

    TIKA-1343 Create a Tika Translator implementation that uses JoshuaDecoder

    This issue addresses https://issues.apache.org/jira/browse/TIKA-1343 and is 
a cleaner PR than the one I just screwed up over at #112 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lewismc/tika TIKA-1343v2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tika/pull/137.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #137
    
----
commit d4fb28f91d77458b15557942438f874b9f564e88
Author: Lewis John McGibbney <[email protected]>
Date:   2016-04-27T22:06:42Z

    TIKA-1343 Create a Tika Translator implementation that uses JoshuaDecoder

commit 4aff4839aece41a739b93169cf7a475ecfc5c70c
Author: Lewis John McGibbney <[email protected]>
Date:   2016-05-05T21:03:01Z

    Merge branch 'master' into TIKA-1343

commit fe559b80bcad1f107904ca7a89724a26ea2921a1
Author: Lewis John McGibbney <[email protected]>
Date:   2016-07-01T20:35:52Z

    Merge master into TIKA-1343

commit a1250ff33c68065e4a812285dfa6a6bd2a6a22de
Author: Lewis John McGibbney <[email protected]>
Date:   2016-09-21T15:05:35Z

    Improve logging and trivial code conventions

commit d50a69361bd0196fb2595313cb47222f61701ba4
Author: Lewis John McGibbney <[email protected]>
Date:   2016-09-21T15:06:47Z

    Merge branch 'master' into TIKA-1343

commit 5657ae6616cd461a19676952f40082b2ec291dac
Author: Lewis John McGibbney <[email protected]>
Date:   2016-10-24T17:49:26Z

    Merge branch 'TIKA-1343' of https://github.com/lewismc/tika into TIKA-1343

commit dadbf55c51d166846aa0d365fd2ed340b604bfae
Author: Lewis John McGibbney <[email protected]>
Date:   2016-10-25T05:20:04Z

    TIKA-1343 Create a Tika Translator implementation that uses JoshuaDecoder

----


> Create a Tika Translator implementation that uses JoshuaDecoder
> ---------------------------------------------------------------
>
>                 Key: TIKA-1343
>                 URL: https://issues.apache.org/jira/browse/TIKA-1343
>             Project: Tika
>          Issue Type: New Feature
>          Components: translation
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 1.15
>
>
> The Joshua Decoder toolkit is a BSD licensed Java-based statistical machine 
> translation system hosted at Github:
> http://joshua-decoder.org/
> Joshua takes in corpuses and trains models that can then be used to do 
> language translation. Currently there is support for e.g., Spanisn->English, 
> Indian dialects->English, Chinese->English, and a few others. 
> https://github.com/joshua-decoder/joshua/
> It would be nice to build a Tika Translator on top of Joshua. There are of 
> course several issues with this:
> * the models are huge - so we'll need a separate package or Maven module, 
> maybe tika-translate-joshua or something to release the models and we'll need 
> to build the models. I just went through the process of building the 
> Spanish->English one, and it still needs to be rebuilt b/c I did it wrong, 
> but it took over a day
> * there is a configuration for Joshua, and so we need some way of passing 
> that config into the Translator. Not sure of the best way to do this.
> * Joshua isn't in the Central repository. I've started a discussion on the 
> Joshua lists about this: 
> https://groups.google.com/forum/#!topic/joshua_support/9Y04miboUj0
> Anyhoo, I've got a working patch right now with hard code stuff, and a manual 
> install into my Maven repo for brave souls out there that want to try it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to