Hi all, I put together a utility which vectorizes plain old Java objects annotated with @Feature and @Target via Mahout's vector encoders.
See my Github branch: https://github.com/frankscholten/mahout/tree/annotation-based-vectorizer and the unit test: https://github.com/frankscholten/mahout/blob/annotation-based-vectorizer/core/src/test/java/org/apache/mahout/classifier/sgd/AnnotationBasedVectorizerTest.java Use it like this: class NewsgroupPost { @Target private String newsgroup; @Feature(encoder = TextValueEncoder.class) private String newsgroup; // Getters & setters } AnnotationBasedVectorizer<NewsgroupPost> vectorizer = new AnnotationBasedVectorizer<NewsgroupPost>(new TypeReference<NewsgroupPost>(){}); Here the vectorizer scans the NewsgroupPost's annotations. Then you can do this: NewsgroupPost post = ... Vector vector = vectorizer.vectorize(post); int target = vectorizer.getTarget(post); int numFeatures = vectorizer.getNumberOfFeatures(); Note that vectorize() and getTarget() methods are genericly typed and due to the type token passed in the constructor we can enforce that only NewsgroupPosts are accepted. The vectorizer uses a Dictionary for encoding the target. Thoughts? Cheers, Frank
