[ 
https://issues.apache.org/jira/browse/UIMA-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14285456#comment-14285456
 ] 

Richard Eckart de Castilho commented on UIMA-3399:
--------------------------------------------------

Looking at this.index I didn't think it'd get us much further. So instead of 
posting its value, I've tried to track this down further and modified the 
LeafPointerIterator locally as follows allowing me to step into the 
{{index.isValid()}} method after the moving:

{noformat}
Class: FSIndexRepositoryImpl.LeafPointerIterator

965:    private LeafPointerIterator(IndexIteratorCachePair iicp, int fs) {
966:      this(iicp);
967:      moveTo(fs);
  +:      index.isValid();
968:    }
{noformat}

Stepping this through shows that {{index.isValid()}} remains true after line 
*996* but becomes false after line *967*. So the problem appears to be 
triggered in {{moveTo(fs)}}.

Looking what happens in {{moveTo}}, I found that it eventually updates the 
index position at

{noformat}
Class: FSBagIndex.IntVectorIterator.moveTo(int)

145:    public void moveTo(int i) {
146:      this.itPos = find(i);
{noformat}

The position passed here as *i = 531*. 
The index uses a {{PositiveIntSet_impl}} with *offset = 465* and *set = {66}*.
Line 146 discovers that position *531 (= offset + set[0])* is in the index and 
sets the itPos to from previously *66* to *531*, the value returned from 
{{find(66)}}. Maybe find should have returned *66* instead of *531*?

When calling {{index.isValid()}} now, the flow eventually ends up in 
{{IntBitSet.isValid(p)}} with *p = 531*. However, {{set.get(531)}} returns 
*false* because {{set.get()}} expects a value relative to the offset. 
{{set.get(66)}} returns *true*.

I wonder if {{IntBitSet}} should throw an {{IndexOutOfBoundsException}} when 
accessing a

{noformat}
position > offset + set[size - 1]
{noformat}

> More consistent handling of multiple add-to-index behavior for same Feature 
> Structure
> -------------------------------------------------------------------------------------
>
>                 Key: UIMA-3399
>                 URL: https://issues.apache.org/jira/browse/UIMA-3399
>             Project: UIMA
>          Issue Type: Improvement
>            Reporter: Marshall Schor
>            Assignee: Marshall Schor
>            Priority: Minor
>             Fix For: 2.7.0SDK
>
>
> UIMA has a somewhat unusual indexing architecture.  You can define indexes 
> (sorted, bag, set), and then add / remove a feature structure (FS) to all of 
> the defined indexes.
> The design intention (I think) was to support the concept of a FS being 
> indexed, or not.  However, the current design allows some anomalies that 
> behave inconsistently between code being run "locally", versus as remote 
> services (due to how serialization handles this).  Serialization encodes only 
> the concept of a FS being either in an index or not. 
> The problem arises in the edge case where the same identical FS is added to 
> the indexes multiple times.  For local (non-remote) cases, for bag and sorted 
> indexes, the same exact FS would be added multiple times.  This would have 
> the consequences:
> -  Iterating would return multiple == FSs.
> -  Remove from indexes of a multiply-added FS would reduce the number by 1; 
> the FS would still be in the index unless the last remaining one was removed..
> For the same code, running remotely, serialization would have "collapsed" the 
> multiple additions into one, so would behave differently.
> This Jira changes the behavior of "add-to-index" so that  subsequent 
> add-to-indexes of a same identical FS would be a no-op. To cover users who 
> might be exploiting the old behavior, the JVM property 
> "uima.allow_duplicate_add_to_indices", read when the UIMA classes are loaded, 
> would restore the previous behavior.
> Note that with this change, the UIMA "Set" index still has a distinct purpose 
> , separate from the "Bag" index, because it defines Feature Structure 
> equivalence based not on identity, but rather on specified key feature values 
> being equal.  
> This change better aligns how code running locally or remotely works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to