[ 
https://issues.apache.org/jira/browse/UIMA-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marshall Schor updated UIMA-4100:
---------------------------------
    Description: 
Delta CAS serialization includes attempting to handle changes made by a remote 
"below the line" - changing existing FSs.   If the remote removes the FSs and 
re-adds them back to the indices, this fact is sent back with the delta CAS as 
a list of FSs that are "re-indexed".  The deserialization takes this list, and 
removes them, and re-adds them back to the indices in an attempt to get them 
properly indexed.  However, this doesn't reliably work, if any of the keys used 
in the indexing were updated. 

Furthermore, there is nothing that insures a remote remove/add FSs it is 
modifying.  Perhaps the remote is unaware that FSs of that type are even being 
indexed (the information about what indices are defined at the client is not 
sent to the remote). 

So, it is possible that the remote could modify some feature, which is used as 
a key by the client.   The Deserialization would update that feature, without 
removing the FS from the indices first.

One conclusion - the list of "re-indexed" FSs returned by the remote during a 
delta serialization is not a reliable indication of which FSs would need 
reindexing, since the index definitions between the client and the remote could 
be different.

One fix: for mods to existing FSs, remove the FS from the indices in each view, 
do the mods, and then add them back (if they were previously found in the 
indices for that view).  Some optimizations: for subtypes of AnnotationBase - 
only one view need be checked.  The remove could be skipped if none of the 
slots being updated are in use as keys in some index; this might be an 
expensive check, though. 

This is a problem for all implementations that support Delta CAS:  Xmi, Binary, 
BinaryCompressed types 4 and 6.

  was:
Delta CAS serialization includes attempting to handle changes made by a remote 
"below the line" - changing existing FSs.   If the remote removes the FSs and 
re-adds them back to the indices, this fact is sent back with the delta CAS as 
a list of FSs that are "re-indexed".  The deserialization takes this list, and 
removes them, and re-adds them back to the indices in an attempt to get them 
properly indexed.  However, this doesn't reliably work, if any of the keys used 
in the indexing were updated. 

Furthermore, there is nothing that insures a remote remove/add FSs it is 
modifying.  Perhaps the remote is unaware that FSs of that type are even being 
indexed (the information about what indices are defined at the client is not 
sent to the remote). 

So, it is possible that the remote could modify some feature, which is used as 
a key by the client.   The Deserialization would update that feature, without 
removing the FS from the indices first.

One conclusion - the list of "re-indexed" FSs returned by the remote during a 
delta serialization is not a reliable indication of which FSs would need 
reindexing, since the index definitions between the client and the remote could 
be different.

A proper fix would be to check if an existing feature being updated was used as 
a key in any index at the client, before updating it, and remove it before 
updating the feature, and then add it back. 

This would require, if Delta CAS was being used, setting up the mechanisms 
proposed in UIMA-4059 to see if when deserializing some changes to existing 
FSs, whether or not changes were being made to features that were being used as 
keys in any index *and* the FS being modified was "added-to-indexes". 

It's likely that all forms of delta deserialization (XMI and Binary and 
Compressed Binary) probably have this issue.


> deserialization of delta CASes broken in some cases
> ---------------------------------------------------
>
>                 Key: UIMA-4100
>                 URL: https://issues.apache.org/jira/browse/UIMA-4100
>             Project: UIMA
>          Issue Type: Bug
>          Components: Core Java Framework
>    Affects Versions: 2.6.0SDK
>            Reporter: Marshall Schor
>            Assignee: Marshall Schor
>             Fix For: 2.7.0SDK
>
>
> Delta CAS serialization includes attempting to handle changes made by a 
> remote "below the line" - changing existing FSs.   If the remote removes the 
> FSs and re-adds them back to the indices, this fact is sent back with the 
> delta CAS as a list of FSs that are "re-indexed".  The deserialization takes 
> this list, and removes them, and re-adds them back to the indices in an 
> attempt to get them properly indexed.  However, this doesn't reliably work, 
> if any of the keys used in the indexing were updated. 
> Furthermore, there is nothing that insures a remote remove/add FSs it is 
> modifying.  Perhaps the remote is unaware that FSs of that type are even 
> being indexed (the information about what indices are defined at the client 
> is not sent to the remote). 
> So, it is possible that the remote could modify some feature, which is used 
> as a key by the client.   The Deserialization would update that feature, 
> without removing the FS from the indices first.
> One conclusion - the list of "re-indexed" FSs returned by the remote during a 
> delta serialization is not a reliable indication of which FSs would need 
> reindexing, since the index definitions between the client and the remote 
> could be different.
> One fix: for mods to existing FSs, remove the FS from the indices in each 
> view, do the mods, and then add them back (if they were previously found in 
> the indices for that view).  Some optimizations: for subtypes of 
> AnnotationBase - only one view need be checked.  The remove could be skipped 
> if none of the slots being updated are in use as keys in some index; this 
> might be an expensive check, though. 
> This is a problem for all implementations that support Delta CAS:  Xmi, 
> Binary, BinaryCompressed types 4 and 6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to