Hi, ::: i'm using mahout for computer vision, so my pipeline is a bit different from the text processing pipeline, i.e. after i've acquired the feature vectors i'm doing a clustering and after i've got the cluster centers and clustered the original feature vectors i'm doing the TF(IDF) vector calculation. This is a quite standard thing nowadays in computer vision...
so i've implemented the part for creating TF(IDF) vectors from the cluster output, based on DocumentVectorizer class. if anybody thinks that it'd be good to have this tool in mahout let me know so i'll create an issue for it JIRA and upload there my patches. cheers, viktor
