On Mon, Nov 15, 2010 at 5:36 AM, Drenski <[email protected]> wrote:
> My goal is to do some clustering of those
> documents. As input for this clustering
> i need a list of feature vectors and each
> feature vector represents a single
> document. I implemented the clustering as
> an annotator. So my first guess was to use
> a collection reader to read these documents
> and put each document in a list which i can
> use for the clustering. But i can't figure out
> where and how to store those documents, so that
> i can use them after all of them are read,
> because the collection reader reads one document
> and then sends it to the annotator.
> Regards,
> Drenski
>

One way to do this would be to have the collection reader put
a single document into each CAS; then an annotator would
processes the document into a feature vector and put into
the CAS; a final annotator (a CAS consumer) would read
the feature vector from each CAS and stores it in a local
array. When ready to do clustering, collectionprocesscomplete
would tell the final annotator to do the clustering step.

Eddie

Reply via email to