On Mon, Nov 15, 2010 at 5:36 AM, Drenski <[email protected]> wrote: > My goal is to do some clustering of those > documents. As input for this clustering > i need a list of feature vectors and each > feature vector represents a single > document. I implemented the clustering as > an annotator. So my first guess was to use > a collection reader to read these documents > and put each document in a list which i can > use for the clustering. But i can't figure out > where and how to store those documents, so that > i can use them after all of them are read, > because the collection reader reads one document > and then sends it to the annotator. > Regards, > Drenski >
One way to do this would be to have the collection reader put a single document into each CAS; then an annotator would processes the document into a feature vector and put into the CAS; a final annotator (a CAS consumer) would read the feature vector from each CAS and stores it in a local array. When ready to do clustering, collectionprocesscomplete would tell the final annotator to do the clustering step. Eddie
