See how this sound(listing down requirements)

A model can be class with a list of matrices, a list of vectors. Each
algorithm takes care of naming these matrices/vectors and reading and
writing values to it (similar to Datastore)
All Classifiers will work with vectors
All Trainers will work with vectors

Multiple techniques to vectorize data.
- Dictionary based
- Random hashing based

A Classifier Training Job will take a Trainer, and a Vector location and
produce a Model
A Classifier Testing Job will take a Classifier, a Model and a Test Vector
location and produce statistics
A Classifier Job will take a Classifier, a Model and a vector location and
label the vectors with probability or likelihood values and return 1 or top
N labels


Model Storage
Datastore has a list of matrices and a list of vectors. It can be serialized
to disk. Or stored on Hbase or any other Hashtable implementation(memcached)

Reply via email to