Eddie Epstein <eaepst...@...> writes: > > Is the analysis of each document to be done independently of > the others? For example, annotation offsets are relative to the > beginning of each document. If not, the documents can be > concatenated together and analyzed at the same time. > > If the documents are to be considered independently, the > annotator has to process each separately. One could > create a view for each document and let the annotator > iterate over all views. Of course since the CAS is memory > resident there is a natural limit to the total size of all > documents to be processed in this way. > > On Sun, Nov 14, 2010 at 10:10 AM, Drenski <milen_dren...@...> wrote: > > Hi, > > I am new to UIMA and i have been struggling for some time > > with the following problem. > > I have some documents, which i need to process simultaneously. > > So I implemented a collection reader, which reads all the files > > from a directory and annotates them as Documents. But how can > > i put these all files in an Array for example so that I can > > iterate them and make my further processing. Basically I > > just want to fetch the files from the directory and put > > them in an array so that i can process them. > > Is CAS consumer what I need? I saw in the doc that > > it is now deprecated. Or should I use some index like Lucene? > > But I guess this will be too complex for my simple task? > > I would appreciate any suggestions. > > Regards, > > Drenski > > > > > >
Thank you for your reply! My goal is to do some clustering of those documents. As input for this clustering i need a list of feature vectors and each feature vector represents a single document. I implemented the clustering as an annotator. So my first guess was to use a collection reader to read these documents and put each document in a list which i can use for the clustering. But i can't figure out where and how to store those documents, so that i can use them after all of them are read, because the collection reader reads one document and then sends it to the annotator. Regards, Drenski
