Hello,

I experience problems using several threads which read annotations in
the same (default) CAS index, inside the same call to the process
method. Since I'm new to UIMA I'm not sure how to interpret that: normal
behaviour due to wrong usage or bug ? The exception stack is:

java.lang.IndexOutOfBoundsException: Index: 0, Size: 3
       at java.util.ArrayList.RangeCheck(ArrayList.java:547)
       at java.util.ArrayList.get(ArrayList.java:322)
       at
org.apache.uima.cas.impl.FSIndexRepositoryImpl$LeafPointerIterator.initPointerIterator(FSIndexRepositoryImpl.java:628)

       at
org.apache.uima.cas.impl.FSIndexRepositoryImpl$LeafPointerIterator.<init>(FSIndexRepositoryImpl.java:636)

       at
org.apache.uima.cas.impl.FSIndexRepositoryImpl$LeafPointerIterator.<init>(FSIndexRepositoryImpl.java:612)

       at
org.apache.uima.cas.impl.FSIndexRepositoryImpl.createPointerIterator(FSIndexRepositoryImpl.java:158)

       at
org.apache.uima.cas.impl.FSIndexRepositoryImpl$IndexImpl.iterator(FSIndexRepositoryImpl.java:792)

       at
org.apache.uima.cas.impl.AnnotationIndexImpl.iterator(AnnotationIndexImpl.java:97)

       at
fr.lipn.uima.testing.TestConcurrentCASAccesAE.getFSIterator(TestConcurrentCASAccesAE.java:59)


I managed to isolate the problem and wrote a simple AE to explain/show
it (attached).

Thanks for your help (and sorry if I missed something in the doc !)

Erwan


package fr.lipn.uima.testing;

import org.apache.uima.analysis_component.JCasAnnotator_ImplBase;
import org.apache.uima.analysis_engine.AnalysisEngineProcessException;
import org.apache.uima.cas.FSIterator;
import org.apache.uima.jcas.JCas;
import org.apache.uima.jcas.tcas.Annotation;

/**
 * 
 * This AE is intended to produce an error due to concurrent calls to <code>getAnnotationIndex(Annotation.type).iterator()</code> (for the same CAS).
 * This error may be a bug in UIMA ??
 * It simply creates random annotations, then start a few threads which simply try to read these annotations.
 * 
 * Since the process is not deterministic, it is sometimes necessary to run the AE a few times in order to see this error: I used a script like
 * <code>
 * num=0
 * while [ 0 == 0 ]; do
 *   num=$(($num+1))
 *   echo "Try $num"
 *   $UIMA_HOME/bin/runCPE.sh desc/test-Threads-CPE.xml >output.tmp 2>error.tmp
 *   x=$(cat error.tmp)
 *   if [ ! -z "$x" ]; then
 *     echo "Error found, stopping."
 *   fi
 * done
 * </code> 
 * 
 * where <code>test-Threads-CPE.xml</code> is a simple CPE including a descriptor for this AE (which does not need to set any parameter or TS).
 * Usually the error appears before 10 tries.
 * 
 *  See more details below in the comments for getFSIterator(JCas aJCas).
 * 
 * @author moreau
 *
 */
public class TestConcurrentCASAccesAE extends JCasAnnotator_ImplBase {
	
	static double annotProbByChar = .2;
	static int annotMaxLength = 10;
	static int maxNbThreads = 20;
	int nb = 0;
	Thread[] threads;
	
	public void process(JCas aJCas) throws AnalysisEngineProcessException {
		annotate(aJCas);
		int nbThreads = (int) (Math.random()*maxNbThreads);
		threads = new Thread[nbThreads];
		for (int i = 0; i < nbThreads; i++) {
			Thread t = new Thread(new CASReaderThread(aJCas, i));
			threads[i] = t;
			t.start();
		}
		int i=0;
		while (i < nbThreads) {   // wait for all threads to die
			if (threads[i].isAlive()) {
				try {
					threads[i].join(); // wait for this thread
				} catch (InterruptedException e) {
					throw new AnalysisEngineProcessException(e);
				}
			}
			i++;
			System.out.println("Main thread: all threads with id lower than "+i+" have terminated");
		}
		
	}
	
	/**
	 * Ask for an iterator.
	 * 
	 * IMPORTANT: see the difference in the case where this method is synchronized: a different error appears, caused in the call to next().
	 *            This second error seems to happen more rarely, and finally when the following method getNextAnnotation is also 
	 *            synchronized it seems that no error appears anymore (tested with more than 6000 successful tries)
	 * @param aJCas
	 * @return
	 */
//	public synchronized FSIterator<Annotation> getFSIterator(JCas aJCas) {
	public FSIterator<Annotation> getFSIterator(JCas aJCas) {
		return aJCas.getAnnotationIndex(Annotation.type).iterator();
	}

//	public synchronized Annotation getNextAnnotation(FSIterator<Annotation> iter) {
	public Annotation getNextAnnotation(FSIterator<Annotation> iter) {
		if (iter.hasNext()) {
			return iter.next();			
		} else {
			return null;
		}
	}
	
	/**
	 * Randomly writes some annotations into the CAS
	 * 
	 * @param aJCas
	 */
	public void annotate(JCas aJCas) {
		int l = aJCas.getDocumentText().length();
		for (int i=0; i< l; i++) {
			if (Math.random() <= annotProbByChar) {
				Annotation myAnnot = new Annotation(aJCas);
				myAnnot.setBegin(i);
				int end = i+((int) (Math.random()*annotMaxLength));
				if (end <= l) {
					myAnnot.setEnd(end);
				} else {
					myAnnot.setEnd(l);
				}
				
				myAnnot.addToIndexes();
				nb++;
			}
		}
		System.out.println(nb+" annotations written.");
	}

	/**
	 * 
	 * Each such thread will simply ask for an iterator over the annotations, then read each annotation.
	 * @author moreau
	 *
	 */
	public class CASReaderThread implements Runnable {
		
		JCas myJCas;
		int id;
		
		public CASReaderThread(JCas aJCas, int id) {
			myJCas = aJCas;
			this.id = id;
		}
		
		public void run() {
			System.out.println("Thread "+id+" starting");
			FSIterator<Annotation> iter = getFSIterator(myJCas);
			int num=0;
			Annotation a = getNextAnnotation(iter);
			while (a != null) {
				System.out.println("Thread id="+id+ " reading annotation "+num+++"at position "+a.getBegin());
				a = getNextAnnotation(iter);
			}
			/*
			while (iter.hasNext()) {
				System.out.println("Thread id="+id+ " reading annotation "+num+++"at position "+iter.next().getBegin());
			}
			*/
			System.out.println("Thread "+id+" terminating");
		}
		
	}
	
}

Reply via email to