[ 
https://issues.apache.org/jira/browse/UIMA-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284294#comment-14284294
 ] 

Richard Eckart de Castilho commented on UIMA-3399:
--------------------------------------------------

I'm looking into this in more detail. Unfortunately, I didn't manage yet to set 
up a minimal test case. However, what I found is that at some point, I have an 
FSIterator "i" with the following configuration (toString()).

{noformat}
FSIteratorWrapper [it=LeafPointerIterator [iicp=IndexIteratorCachePair, 
index=FSLeafIndexImpl 
[type=de.tudarmstadt.ukp.dkpro.core.api.coref.type.CoreferenceChain, 
kind=Default Bag]
  cache 0  FSLeafIndexImpl 
[type=de.tudarmstadt.ukp.dkpro.core.api.coref.type.CoreferenceChain, 
kind=Default Bag]
, index=org.apache.uima.cas.impl.FSBagIndex$IntVectorIterator@39dcf4b0]]
{noformat}

This iterator is valid (i.isValid() returns true), but a copy of the iterator 
is no longer valid (i.copy().isValid() returns false) - it does not happen when 
ALLOW_DUP_ADD_TO_INDEXES to "true" but consistently happens when  
ALLOW_DUP_ADD_TO_INDEXES is not set. 

Apparently, this does not happen when I create a minimal CAS with minimal 
annotations of my CoreferenceChain type. But it happens in one situation where 
the CAS already contains all kinds of other annotations.


> More consistent handling of multiple add-to-index behavior for same Feature 
> Structure
> -------------------------------------------------------------------------------------
>
>                 Key: UIMA-3399
>                 URL: https://issues.apache.org/jira/browse/UIMA-3399
>             Project: UIMA
>          Issue Type: Improvement
>            Reporter: Marshall Schor
>            Assignee: Marshall Schor
>            Priority: Minor
>             Fix For: 2.7.0SDK
>
>
> UIMA has a somewhat unusual indexing architecture.  You can define indexes 
> (sorted, bag, set), and then add / remove a feature structure (FS) to all of 
> the defined indexes.
> The design intention (I think) was to support the concept of a FS being 
> indexed, or not.  However, the current design allows some anomalies that 
> behave inconsistently between code being run "locally", versus as remote 
> services (due to how serialization handles this).  Serialization encodes only 
> the concept of a FS being either in an index or not. 
> The problem arises in the edge case where the same identical FS is added to 
> the indexes multiple times.  For local (non-remote) cases, for bag and sorted 
> indexes, the same exact FS would be added multiple times.  This would have 
> the consequences:
> -  Iterating would return multiple == FSs.
> -  Remove from indexes of a multiply-added FS would reduce the number by 1; 
> the FS would still be in the index unless the last remaining one was removed..
> For the same code, running remotely, serialization would have "collapsed" the 
> multiple additions into one, so would behave differently.
> This Jira changes the behavior of "add-to-index" so that  subsequent 
> add-to-indexes of a same identical FS would be a no-op. To cover users who 
> might be exploiting the old behavior, the JVM property 
> "uima.allow_duplicate_add_to_indices", read when the UIMA classes are loaded, 
> would restore the previous behavior.
> Note that with this change, the UIMA "Set" index still has a distinct purpose 
> , separate from the "Bag" index, because it defines Feature Structure 
> equivalence based not on identity, but rather on specified key feature values 
> being equal.  
> This change better aligns how code running locally or remotely works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to