[jira] [Commented] (UIMA-4049) The curious case of the zombie annotation

Marshall Schor (JIRA) Tue, 14 Oct 2014 14:11:13 -0700

    [ 
https://issues.apache.org/jira/browse/UIMA-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171533#comment-14171533
 ]


Marshall Schor commented on UIMA-4049:
--------------------------------------

I agree with Richard's assessment, in general.

Some additional observations:

Annotations, if added / removed to/from an index while iterating over it, cause 
the mechanisms which are tracking where you are to potentially get confused.  
You can slightly get around this by doing a move-to-first or last operation 
which don't depend on knowing where you are.  See the Javadocs for FSIterator, 
where some of the moveto... operators say they signal a 
ConcurrentModificationException if the underlying indexes being iterated over 
were modified, while others say "Allowed even if the underlying indexes being 
iterated over were modified."

So, indexes can be updated, while iterating over them, sort of, if you "reset" 
the iterator afterwards by doing a move to first or last.

There is prominent feedback for some (but not all) things: if you add/remove 
FSs to an index while iterating over them and then attempt to move unless you 
use the "reset" kind of motion), you get a concurrent modification exception.

To catch the additional kinds of errors, we could modify the base system to 
record for each FS whether or not it was "added-to-the-indexes".  This was 
discussed in a previous Jira (see 
https://issues.apache.org/jira/browse/UIMA-3399.)

If we had a flag which could tell if some arbitrary FS was 
"added-to-the-indexes", then we could signal an error if one or more of the 
features used as index keys was modified.  There's a performance impact to 
consider; but I guess we could mitigate that by having an "assert" mode like 
Java does - and only doing this checking if enabled.


> The curious case of the zombie annotation
> -----------------------------------------
>
>                 Key: UIMA-4049
>                 URL: https://issues.apache.org/jira/browse/UIMA-4049
>             Project: UIMA
>          Issue Type: Bug
>          Components: Core Java Framework
>            Reporter: Richard Eckart de Castilho
>            Assignee: Marshall Schor
>         Attachments: CuriousTestCase.java
>
>
> When annotations are removed from indexes, sometimes they come back... the 
> following test case shows how an annotation is removed but still present when 
> iterating over the index later.
> {code}
>     @Test
>     public void testForZombies() throws Exception
>     {
>         // No zombie here
>         int[] offsets1 = { 0, 4, 5, 11, 12, 21, 22, 25, 26, 29, 30, 35, 36, 
> 40, 41, 50, 51, 60, 61,
>                 64, 64, 65 };
>         testForZombies("Dies flößte Friedrich II. für seine neue Eroberung 
> Besorgnis ein.", offsets1);
>         
>         // Zombie hiding in here
>         int[] offsets2 = { 0, 3, 4, 7, 8, 13, 14, 18, 19, 22, 23, 33, 34, 35 
> };
>         testForZombies("Ich bin Franz III. von Hammerfels !", offsets2);
>     }
>     public void testForZombies(String aText, int[] aOffsets) throws Exception
>     {
>         // Init some dictionaries we ues
>         Set<String> names = new HashSet<String>();
>         names.add("Friedrich");
>         names.add("Franz");
>         Set<String> suffix = new HashSet<String>();
>         suffix.add("II.");
>         suffix.add("III.");
>         // Set up type system
>         TypeSystemDescription tsd = new TypeSystemDescription_impl();
>         tsd.addType("Token", "", CAS.TYPE_NAME_ANNOTATION);
>         
>         // Create CAS
>         CAS jcas = CasCreationUtils.createCas(tsd, null, null);
>         jcas.setDocumentText(aText);
>         
>         Type tokenType = jcas.getTypeSystem().getType("Token");
>         Feature beginFeature = tokenType.getFeatureByBaseName("begin");
>         
>         // Create tokens in CAS
>         for (int i = 0; i < aOffsets.length; i += 2) {
>             jcas.addFsToIndexes(jcas.createAnnotation(tokenType, aOffsets[i], 
> aOffsets[i+1]));
>         }
>         
>         // List the tokens in the CAS
>         for (AnnotationFS token : jcas.getAnnotationIndex(tokenType)) {
>             System.out.printf("Starting with %s%n", token.getCoveredText());
>         }
>         // Merge some tokens, in particular "Franz" "III." -> "Franz III." 
> and "Friedrich" "II."
>         // into "Friedrich II."
>         AnnotationFS previous = null;
>         List<AnnotationFS> toDelete = new ArrayList<>();
>         for (AnnotationFS token : jcas.getAnnotationIndex(tokenType)) {
>             if (previous != null && names.contains(previous.getCoveredText())
>                     && suffix.contains(token.getCoveredText())) {
>                 token.setIntValue(beginFeature, previous.getBegin());
>                 toDelete.add(previous);
>             }
>             previous = token;
>         }
>         // Remove the no longer necessary tokens ("Friedrich" and "Franz"), 
> since we expanded the
>         // following tokens "III." and "II." to include their text
>         Set<String> removedWords = new HashSet<String>();
>         for (AnnotationFS token : toDelete) {
>             System.out.printf("Removing %s%n", token.getCoveredText());
>             removedWords.add(token.getCoveredText());
>             jcas.removeFsFromIndexes(token);
>         }
>         // Check if the tokens that we wanted to remove are really gone
>         for (AnnotationFS token : jcas.getAnnotationIndex(tokenType)) {
>             System.out.printf("Remaining %s%n", token.getCoveredText());
>             if (removedWords.contains(token.getCoveredText())) {
>                org.junit.Assert.fail("I saw a zombie!!!");
>             }
>         }
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (UIMA-4049) The curious case of the zombie annotation

Reply via email to