The example that Grant mentioned was the Thrift/Zookeeper based classifier service that I did for the book.
It was exceedingly simple. There is a special place in ZK that has a URL of a model to load. When that changes, the server(s) suck in the model from there using ModelSerializer. They then advertise themselves as ready to serve classifications by sticking ephemeral indicator files into another ZK directory. The client looks at the indicator files and randomly picks a server to query. The query is sent via Thrift. This makes a great example because it shows all the kinds of plumbing needed for a classifier farm. It isn't easily generalized because the actual classification operation needed is not very general. In real life, people need to send 1000 classification requests at once. Or they need to run a single data point against 1000 models. Or ... stuff. So I don't have a good story for how this could be turned from a nice didactic exercise into a genuinely useful piece of software On Mon, Apr 4, 2011 at 10:40 AM, Benson Margulies <[email protected]>wrote: > Rest services I can do. But is the idea to launch hadoop jobs from a > rest service? > > On Mon, Apr 4, 2011 at 1:23 PM, Ted Dunning <[email protected]> wrote: > > Very close to AbstractVectorClassifier. Something along the lines > > of VectorModelClassifier. > > > > On Mon, Apr 4, 2011 at 10:08 AM, Jeff Eastman <[email protected]> > wrote: > > > >> +1 Can you suggest what this API might look like? > >> > >> -----Original Message----- > >> From: Ted Dunning [mailto:[email protected]] > >> Sent: Monday, April 04, 2011 10:06 AM > >> To: [email protected] > >> Cc: Shannon Quinn > >> Subject: Re: Pitching in > >> > >> And it would be beautiful to unify the classifiers and clusterers under > a > >> consistent API so that we can train any clusterer or classifier > >> and then use it without regard for where it came from or what it is > under > >> the covers. > >> > >> On Mon, Apr 4, 2011 at 9:53 AM, Shannon Quinn <[email protected]> > wrote: > >> > >> > +1 > >> > > >> > Would love to help with this, too. > >> > > >> > Apologies for the brevity, this was sent from my iPhone > >> > > >> > On Apr 4, 2011, at 12:45, Daniel McEnnis <[email protected]> wrote: > >> > > >> > > Benson, > >> > > > >> > > If I could chime in: it would be beautiful if all classifiers, > >> > > clusterers, and recommendation engines used both text and a (binary > or > >> > > not) vector format for input. At least Naive Bayes, possibly others > >> > > use something in-between. > >> > > > >> > > Daniel McEnnis > >> > > > >> > > On Mon, Apr 4, 2011 at 8:12 AM, Benson Margulies < > >> [email protected]> > >> > wrote: > >> > >> So, wanting to offer an excuse not to go emeritus, I wonder at you > >> > >> all: what's in need of doing that fits my 'plumbing repair' > profile? > >> > >> Ditching or normalizing the 'special' cli? That is, either make a > fork > >> > >> of the never-to-be-released 2.0 commons-cli in our svn and set up > to > >> > >> make a full release of it with our releases, or change our code to > >> > >> live with the most recent version that did get released. Or > something > >> > >> else? > >> > >> > >> > > >> > > >
