FYI, I stress-tested the Joshua server with the following protocol: for both the TCP and HTTP servers, I started a six-thread server, and then sent five simultaneous 16k documents at each. The translation times were as follows:
TCP: (times: 8:07 8:06 8:06) for x in 1 2 3 4; do for num in $(seq 1 5); do cat corpus.es | nc localhost 5674 > t.tcp.$num & done; time wait; done) HTTP: (times: 7:25 7:34 7:20) for x in 1 2 3 4; do for num in $(seq 1 5); do /home/hltcoe/mpost/code/joshua/scripts/support/query_http.py -s localhost -p 5674 corpus.es > t.out.$num & done; time wait; done The HTTP query takes 100 lines of the test set at a time, constructs the RESTful query string (with 100 url-encoded "q=..." lines), and sends it to the server. So the bottom line is that the HTTP server both has an extended Google-translate API (which also supports other things like adding rules) and is a bit faster. I'm documenting the RESTful API here: https://cwiki.apache.org/confluence/display/JOSHUA/RESTful+API matt > On Mar 3, 2017, at 11:24 AM, Matt Post <p...@cs.jhu.edu> wrote: > > Folks, > > I've updated the code with a few changes that will support Dockerized > language packs. The nice thing is that this makes it easy to include KenLM. > > Here are some changes that were made: > > - Joshua now notes what directory the config file was found in and loads > relative paths found in the config file relative to that directory > automatically. This means you don't have to "cd" to the LP (language pack) > directory before running Joshua. > > - I fixed the HTTP server to take multiple "q=" lines, just like the Google > translate API. Before, they only took one "q=" line. This should mean (I'll > test later today) that the HTTP server can handle throughput essentially at > the rates of the TCP server. > > - I added (but haven't pushed yet) the KenLM model files to the language > packs. In addition, I added a file "joshua.config.kenlm". These are not used > except by Docker. > > - I fixed the docker setup. See the new file: > > > https://github.com/apache/incubator-joshua/blob/master/distribution/docker/kenlm/Dockerfile > > <https://github.com/apache/incubator-joshua/blob/master/distribution/docker/kenlm/Dockerfile> > > This docker container builds KenLM. It then expects to be run with docker > mounting an existing language pack to /model. It then runs the > joshua.config.kenlm file, running it as a server in HTTP mode. See the README > file for information: > > > https://github.com/apache/incubator-joshua/tree/master/distribution/docker/kenlm > > <https://github.com/apache/incubator-joshua/tree/master/distribution/docker/kenlm> > > If anyone wants to test this out, please do. You can grab an updated language > pack (version 3) here: > > > http://cs.jhu.edu/~post/language-packs/apache-joshua-es-en-2017-03-03.tgz > <http://cs.jhu.edu/~post/language-packs/apache-joshua-es-en-2017-03-03.tgz> > > (Warning: 9 GB) > > matt > > >> On Nov 23, 2016, at 10:14 AM, kellen sunderland >> <kellen.sunderl...@gmail.com> wrote: >> >> Yeah it should just be docker 'pull kellens/apache-joshua-es-en-2016-10-05' >> then 'docker run -it kellens/apache-joshua-es-en-2016-10-05 /bin/bash' or >> something similar. I think the default command should eventually be to run >> the http server, so ideally we'd just do 'docker run -p 5674 >> kellens/apache-joshua-es-en-2016-10-05' and that would start up the http >> server on port 5674. >> >> Good point on Perl + Python, I can add them. >> >> -Kellen >> >> On Wed, Nov 23, 2016 at 3:22 PM, Matt Post <p...@cs.jhu.edu> wrote: >> >>> Okay, I have this with >>> >>> docker run -it kellens/apache-joshua-es-en-2016-10-05 bash >>> >>> It seems we are missing Perl (./prepare.sh fails), and we should replace >>> the LanguageModel line with a KenLM instance and build that. I bet we'll >>> need Python, too. >>> >>> >>> >>> >>>> On Nov 23, 2016, at 8:15 AM, Matt Post <p...@cs.jhu.edu> wrote: >>>> >>>> Kellen, can I bother you to post a few first steps? I've successfully >>> pulled this down to my mac but now do not know how to find it, edit it, or >>> run it. I'm porting through the documentation and will find it eventually >>> but this would save me a bit of time. >>>> >>>> >>>>> On Nov 23, 2016, at 8:07 AM, kellen sunderland < >>> kellen.sunderl...@gmail.com> wrote: >>>>> >>>>> Yes my next step was going to be getting it hosted officially. >>>>> >>>>> I'll go ahead and open a ticket. I think I'll hold off on pushing to >>> the >>>>> Apache account until I've done a little more testing though. >>>>> >>>>> On Nov 23, 2016 5:22 AM, "lewis john mcgibbney" <lewi...@apache.org> >>> wrote: >>>>> >>>>>> Hi Kellen, >>>>>> Nice :) >>>>>> Another option is for us to host these via the Apache account. >>>>>> https://hub.docker.com/r/apache/ >>>>>> We could then add a badge to our README which points to the >>> Dockerfile(s). >>>>>> Do you want to open a ticket over on the INFRA Jira for this? >>>>>> >>>>>> On Tue, Nov 22, 2016 at 1:57 PM, < >>>>>> dev-digest-h...@joshua.incubator.apache.org> wrote: >>>>>> >>>>>>> From: kellen sunderland <kellen.sunderl...@gmail.com> >>>>>>> To: "dev@joshua.incubator.apache.org" <dev@joshua.incubator.apache. >>> org> >>>>>>> Cc: >>>>>>> Date: Tue, 22 Nov 2016 22:56:56 +0100 >>>>>>> Subject: Re: Dockerhub hosted images >>>>>>> Ok, the first image should be properly uploaded now. >>>>>>> >>>>>>> https://hub.docker.com/r/kellens/apache-joshua-es-en-2016-10-05/ >>>>>>> >>>>>>> -Kellen >>>>>>> >>>>>>> >>>>>> >>>> >>> >>> >