Re: Dockerhub hosted images

Matt Post Tue, 07 Mar 2017 08:15:04 -0800

FYI, I stress-tested the Joshua server with the following protocol: for both 
the TCP and HTTP servers, I started a six-thread server, and then sent five 
simultaneous 16k documents at each. The translation times were as follows:


TCP: (times: 8:07 8:06 8:06)

        for x in 1 2 3 4; do for num in $(seq 1 5); do cat corpus.es | nc 
localhost 5674 > t.tcp.$num & done; time wait; done)

HTTP: (times: 7:25 7:34 7:20)

        for x in 1 2 3 4; do for num in $(seq 1 5); do 
/home/hltcoe/mpost/code/joshua/scripts/support/query_http.py -s localhost -p 
5674 corpus.es > t.out.$num & done; time wait; done

The HTTP query takes 100 lines of the test set at a time, constructs the 
RESTful query string (with 100 url-encoded "q=..." lines), and sends it to the 
server.

So the bottom line is that the HTTP server both has an extended 
Google-translate API (which also supports other things like adding rules) and 
is a bit faster.

I'm documenting the RESTful API here: 
https://cwiki.apache.org/confluence/display/JOSHUA/RESTful+API

matt


> On Mar 3, 2017, at 11:24 AM, Matt Post <p...@cs.jhu.edu> wrote:
> 
> Folks,
> 
> I've updated the code with a few changes that will support Dockerized 
> language packs. The nice thing is that this makes it easy to include KenLM.
> 
> Here are some changes that were made:
> 
> - Joshua now notes what directory the config file was found in and loads 
> relative paths found in the config file relative to that directory 
> automatically. This means you don't have to "cd" to the LP (language pack) 
> directory before running Joshua.
> 
> - I fixed the HTTP server to take multiple "q=" lines, just like the Google 
> translate API. Before, they only took one "q=" line. This should mean (I'll 
> test later today) that the HTTP server can handle throughput essentially at 
> the rates of the TCP server.
> 
> - I added (but haven't pushed yet) the KenLM model files to the language 
> packs. In addition, I added a file "joshua.config.kenlm". These are not used 
> except by Docker.
> 
> - I fixed the docker setup. See the new file:
> 
>       
> https://github.com/apache/incubator-joshua/blob/master/distribution/docker/kenlm/Dockerfile
>  
> <https://github.com/apache/incubator-joshua/blob/master/distribution/docker/kenlm/Dockerfile>
> 
> This docker container builds KenLM. It then expects to be run with docker 
> mounting an existing language pack to /model. It then runs the 
> joshua.config.kenlm file, running it as a server in HTTP mode. See the README 
> file for information:
> 
>       
> https://github.com/apache/incubator-joshua/tree/master/distribution/docker/kenlm
>  
> <https://github.com/apache/incubator-joshua/tree/master/distribution/docker/kenlm>
> 
> If anyone wants to test this out, please do. You can grab an updated language 
> pack (version 3) here:
> 
>       
> http://cs.jhu.edu/~post/language-packs/apache-joshua-es-en-2017-03-03.tgz 
> <http://cs.jhu.edu/~post/language-packs/apache-joshua-es-en-2017-03-03.tgz>
> 
> (Warning: 9 GB)
> 
> matt
> 
> 
>> On Nov 23, 2016, at 10:14 AM, kellen sunderland 
>> <kellen.sunderl...@gmail.com> wrote:
>> 
>> Yeah it should just be docker 'pull kellens/apache-joshua-es-en-2016-10-05'
>> then 'docker run -it kellens/apache-joshua-es-en-2016-10-05 /bin/bash' or
>> something similar.  I think the default command should eventually be to run
>> the http server, so ideally we'd just do 'docker run -p 5674
>> kellens/apache-joshua-es-en-2016-10-05' and that would start up the http
>> server on port 5674.
>> 
>> Good point on Perl + Python, I can add them.
>> 
>> -Kellen
>> 
>> On Wed, Nov 23, 2016 at 3:22 PM, Matt Post <p...@cs.jhu.edu> wrote:
>> 
>>> Okay, I have this with
>>> 
>>>       docker run -it kellens/apache-joshua-es-en-2016-10-05 bash
>>> 
>>> It seems we are missing Perl (./prepare.sh fails), and we should replace
>>> the LanguageModel line with a KenLM instance and build that. I bet we'll
>>> need Python, too.
>>> 
>>> 
>>> 
>>> 
>>>> On Nov 23, 2016, at 8:15 AM, Matt Post <p...@cs.jhu.edu> wrote:
>>>> 
>>>> Kellen, can I bother you to post a few first steps? I've successfully
>>> pulled this down to my mac but now do not know how to find it, edit it, or
>>> run it. I'm porting through the documentation and will find it eventually
>>> but this would save me a bit of time.
>>>> 
>>>> 
>>>>> On Nov 23, 2016, at 8:07 AM, kellen sunderland <
>>> kellen.sunderl...@gmail.com> wrote:
>>>>> 
>>>>> Yes my next step was going to be getting it hosted officially.
>>>>> 
>>>>> I'll go ahead and open a ticket.  I think I'll hold off on pushing to
>>> the
>>>>> Apache account until I've done a little more testing though.
>>>>> 
>>>>> On Nov 23, 2016 5:22 AM, "lewis john mcgibbney" <lewi...@apache.org>
>>> wrote:
>>>>> 
>>>>>> Hi Kellen,
>>>>>> Nice :)
>>>>>> Another option is for us to host these via the Apache account.
>>>>>> https://hub.docker.com/r/apache/
>>>>>> We could then add a badge to our README which points to the
>>> Dockerfile(s).
>>>>>> Do you want to open a ticket over on the INFRA Jira for this?
>>>>>> 
>>>>>> On Tue, Nov 22, 2016 at 1:57 PM, <
>>>>>> dev-digest-h...@joshua.incubator.apache.org> wrote:
>>>>>> 
>>>>>>> From: kellen sunderland <kellen.sunderl...@gmail.com>
>>>>>>> To: "dev@joshua.incubator.apache.org" <dev@joshua.incubator.apache.
>>> org>
>>>>>>> Cc:
>>>>>>> Date: Tue, 22 Nov 2016 22:56:56 +0100
>>>>>>> Subject: Re: Dockerhub hosted images
>>>>>>> Ok, the first image should be properly uploaded now.
>>>>>>> 
>>>>>>> https://hub.docker.com/r/kellens/apache-joshua-es-en-2016-10-05/
>>>>>>> 
>>>>>>> -Kellen
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>> 
>>> 
>

Re: Dockerhub hosted images

Reply via email to