Sketching out scala traits and 1.0 API

Andrew Palumbo Thu, 29 May 2014 16:07:31 -0700

Hopefully that's not too ambitious a title.

Starting a new thread here to discuss, at least conceptually, possible 
implementations of scala traits and or abstract classes for 
Classifiers/Clusterers/Recommendors.  The idea would be to lay these out 
wherever possible in order to make porting to and new algorithm development in 
the Scala DSL as easy and as uniform as possible.


See below for Ted's inital proposals regarding Classifiers, and Pat's work in 
implementing a Scala based cooccurrence recommender with a CLI wrapper and 
import/export functionality and proposal for an API to serve recommenders.

Any input is appreciated.



Regarding recommenders, drivers, and import/export:
 
> Subject: Re: do we really need scala still
> From: [email protected]
> Date: Thu, 29 May 2014 08:58:04 -0700
> To: [email protected]
> 
> Regarding recommenders, drivers, and import/export:
> 
> I’ve got Sebastian’s cooccurrence code wrapped with a driver that reads text 
> delimited
files into a drm for use with cooccurrence. Then it writes the indicator 
matrix(es) as text
delimited files with user specified IDs. It also has a proposed Driver base 
class, Scala based
option parser and ReadStore/WriteStore traits. The CLI will be mostly a 
superset of the itemsimilarity
in legacy mr. The read/write stuff is meant to be pretty generic so I was 
planning to do a
DB and maybe JSON example (some day). There is still a bit of functional 
programming refactoring
and the docs are not up to date.
> 
> With cooccurrence working we could do something that replaces all the 
> cooccurrence  recommenders
(in-memory and MR) with one codebase. Add Solr and you have a single machine 
server based
recommender that we can supply with an API similar to the legacy in-memory 
recommender. The
cool thing is that It will scale out to a cluster with Solr and HDFS, requiring 
only config
changes. The downside is that it requires at least a standalone local version 
of Spark to
do the cooccurrence. BTW this would give us something people have been asking 
for—a recommender
service.
> 
> Is anyone else interested in CLI, drivers, read/write in the import/export 
> sense? Or
a new architecture for the recommenders? If so, maybe a separate thread?
> 
> On May 29, 2014, at 7:03 AM, Ted Dunning <[email protected]> wrote:
> 
> Andrew,
> 
> Sebastian and I were talking yesterday and guessing that you would be
> interested in this soon.  Glad to know the world is as expected.
> 
> Yes. This needs to happen at least at a very conceptual level.  For
> instance, for classifiers, I think that we need to have something like:
> 
>    - progressively train against a batch of data
>         questions: should this do multiple epochs?  Throw an exception if
> on-line training not supported?  throw an exception if too little data
> provided?
> 
>    - classify a batch of data
> 
>    - serialize a model
> 
>    - de-serialize a model
> 
> Note that a batch listed above should be either a bunch of observations or
> just one.
> 
> Question: does this handle the following cases:
> 
> - naive bayes
> - SGD trained on continuous data
> - batch trained <mumble> classifiers
> - downpour type classifier training
> 
> ?
> 
> 
> 
> On Wed, May 28, 2014 at 6:25 PM, Andrew Palumbo <[email protected]> wrote:
> 
> > This may be somewhat tangential to this thread, but would now be a good
> > time to start laying out some scala traits for
> > Classifiers/Clusterers/Recommenders?  I am totally scala-naive, but have
> > been trying to keep up with the discussions.
> > 
> > I don't know if this is premature but it seems that now that the DSL data
> > structures have been at least sketched out if not fully implemented,  it
> > would be useful to have these in place before people start porting too much
> > over.  It might be helpful in bringing in new contributions as well.
> > 
> > It could also help regarding people's questions of integrating a future
> > wrapper layer.
> > 
> >

Sketching out scala traits and 1.0 API

Reply via email to