See how this sound(listing down requirements) A model can be class with a list of matrices, a list of vectors. Each algorithm takes care of naming these matrices/vectors and reading and writing values to it (similar to Datastore) All Classifiers will work with vectors All Trainers will work with vectors
Multiple techniques to vectorize data. - Dictionary based - Random hashing based A Classifier Training Job will take a Trainer, and a Vector location and produce a Model A Classifier Testing Job will take a Classifier, a Model and a Test Vector location and produce statistics A Classifier Job will take a Classifier, a Model and a vector location and label the vectors with probability or likelihood values and return 1 or top N labels Model Storage Datastore has a list of matrices and a list of vectors. It can be serialized to disk. Or stored on Hbase or any other Hashtable implementation(memcached)
