The idea is two fold really. a) From the docs:
"If you have a Hadoop installation, make sure you’ve set $HADOOP to point to it. For example, if the hadoop command is in /usr/bin, you should type export HADOOP=/usr Joshua will find the binary and use it to submit to your hadoop cluster. If you don’t have one, just make sure that HADOOP is unset, and Joshua will roll one out for you and run it in standalone mode." So Joe User wants to train a model but doesn't want to sink their laptop in doing so, but similarly doesn't know how to deploy or doesn't want to go through the effort of deploying a multinode hadoop cluster. My understanding, having gone through the docs and having a chat with Lewis, is that Thrax will pass the job off to hadoop. So a setup like the video depicts would remove the need for Joshua rolling out a standalone Hadoop setup. Of course, I don't know how Thrax works under the hood, if it doesn't leverage a cluster, this is clearly not required, but as the docs mention the word cluster, I worked under the assumption that it did. b) If we ignore all you language geeks, consumers should be able to use Joshua in a variety of situations. I have the runtime version setup in another charm that allows users to spin it up, define a language pack to install, configure it and they can then chuck translations at it, again, in about 3 lines of code to the end user. This is like Google Translate in a box, but without going through the compilation rigmarole, again, something we should be aiming for with end users. That said, after discussing use cases with Lewis and seeing the talk of API's and stuff, one thing I will be working on in the coming months, is a web-ui for Joshua so when its spun up, users can just dump stuff into a box, or use CURL (I know there is some support there already), similarly, being able to dump Joshua into a Hadoop cluster for processing of data should be something we can do (we may be able to already, I've not looked, although the C stuff makes me wonder). Also being able to distribute the Joshua runtime over your cluster would be cool as well. Tom -------------- Director Meteorite.bi - Saiku Analytics Founder Tel: +44(0)5603641316 (Thanks to the Saiku community we reached our Kickstart <http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/> goal, but you can always help by sponsoring the project <http://www.meteorite.bi/products/saiku/sponsorship>) On 20 May 2016 at 10:13, kellen sunderland <[email protected]> wrote: > Hey Tom, nice work. I'll take a closer look soon but just had a question > about the use case. Would the idea be that you could use Joshua to > translate text in a map during a hadoop job? > > -Kellen > > On Fri, May 20, 2016 at 12:31 AM, Tom Barber <[email protected]> > wrote: > > > Hi guys > > > > I figured this was worth sharing as its what I was working on whilst sat > > with Lewis and Kellen at ApacheCon. > > > > I'm looking at creating a Juju deployment for Joshua which people can > > instantly attach to Hadoop to train models, but instead of using Hadoop > on > > a standalone mode, I want to be able to simply deploy the same code in > the > > cloud and scale up my training if required (I'm not a translation guy so > I > > don't know how that would work in real life performance, but to the sys > > admin in me, it makes sense). > > > > Anyway, I figured I'd put together a sped up and cut up demo that shows > the > > deployment in AWS: > > > > https://www.youtube.com/watch?v=dnOQEVSMB-4&feature=youtu.be > > > > This deploys Joshua 6.0.5 on its own compute node, and also a multi node > > hadoop cluster (which you can scale with 1 command), and associates the > > two. I need to finialise the hadoop client plumbing but should be done > > early next week. > > > > Anyway, if there is an appitite for this alongside whatever docker stuff > > people are working on, I'll happily commit the charms( the code that runs > > it) back to the Joshua git repo and we can maintain it in a more > "official" > > manner. > > > > Tom > > -------------- > > > > Director Meteorite.bi - Saiku Analytics Founder > > Tel: +44(0)5603641316 > > > > (Thanks to the Saiku community we reached our Kickstart > > < > > > http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/ > > > > > goal, but you can always help by sponsoring the project > > <http://www.meteorite.bi/products/saiku/sponsorship>) > > >
