I agree that models should be highly generic. I just don't think that we should legislate the content of either their internal model, nor of their serialized representation.
The contract is pretty clear, however. There are just a few methods and it isn't hard for all models to support them, especially with an abstract class providing default implementations. There are a few strangenesses in the API that I would suggest that can help a lot with performance. For instance, for two-class models, it should be possible to call the model and get back a double value which is the score for the first class. For k-class models, it should be possible to pass in a vector that gets the scores for the various models and both 1 of k and 1 of k-1 encoding should be supported. All of these variants are useful to avoid having to cons up a bunch of extra vectors at classification time. On Tue, Jun 22, 2010 at 10:07 AM, Robin Anil <[email protected]> wrote: > The reason I said models be generic is because they can then be read across > classifiers. Like if the classifier does nearest centroid matching like in > NB or using output of K-Means it can be used. Or if a margin trained using > pegasos can be used by any SVM classifier. Thats all >
