[
https://issues.apache.org/jira/browse/UIMA-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marshall Schor updated UIMA-4100:
---------------------------------
Description:
Delta CAS serialization includes attempting to handle changes made by a remote
"below the line" - changing existing FSs. If the remote removes the FSs and
re-adds them back to the indices, this fact is sent back with the delta CAS as
a list of FSs that are "re-indexed". The deserialization takes this list, and
removes them, and re-adds them back to the indices in an attempt to get them
properly indexed. However, this doesn't reliably work, if any of the keys used
in the indexing were updated.
Furthermore, there is nothing that insures a remote remove/add FSs it is
modifying. Perhaps the remote is unaware that FSs of that type are even being
indexed (the information about what indices are defined at the client is not
sent to the remote).
So, it is possible that the remote could modify some feature, which is used as
a key by the client. The Deserialization would update that feature, without
removing the FS from the indices first.
One conclusion - the list of "re-indexed" FSs returned by the remote during a
delta serialization is not a reliable indication of which FSs would need
reindexing, since the index definitions between the client and the remote could
be different.
A proper fix would be to check if an existing feature being updated was used as
a key in any index at the client, before updating it, and remove it before
updating the feature, and then add it back.
This would require, if Delta CAS was being used, setting up the mechanisms
proposed in UIMA-4059 to see if when deserializing some changes to existing
FSs, whether or not changes were being made to features that were being used as
keys in any index *and* the FS being modified was "added-to-indexes".
It's likely that all forms of delta deserialization (XMI and Binary and
Compressed Binary) probably have this issue.
was:
Delta CAS serialization includes attempting to handle changes made by a remote
"below the line" - changing existing FSs. If the remote removes the FSs and
re-adds them back to the indices, this fact is sent back with the delta CAS as
a list of FSs that are "re-indexed". The deserialization takes this list, and
removes them, and re-adds them back to the indices in an attempt to get them
properly indexed. However, this doesn't reliably work, if any of the keys used
in the indexing were updated.
Furthermore, there is nothing that insures a remote remove/add FSs it is
modifying. Perhaps the remote is unaware that FSs of that type are even being
indexed (the information about what indices are defined at the client is not
sent to the remote).
So, it is possible that the remote could modify some feature, which is used as
a key by the client. The Deserialization would update that feature, without
removing the FS from the indices first.
One conclusion - the list of "re-indexed" FSs returned by the remote during a
delta serialization is not a reliable indication of which FSs would need
reindexing, since the index definitions between the client and the remote could
be different.
A proper fix would be to check if an existing feature being updated was used as
a key in any index at the client, before updating it, and remove it before
updating the feature, and then add it back.
This would require, if Delta CAS was being used, setting up the mechanisms
proposed in UIMA-4059 to see if when deserializing some changes to existing
FSs, whether or not changes were being made to features that were being used as
keys in any index *and* the FS being modified was "added-to-indexes".
> deserialization of delta CASes broken in some cases
> ---------------------------------------------------
>
> Key: UIMA-4100
> URL: https://issues.apache.org/jira/browse/UIMA-4100
> Project: UIMA
> Issue Type: Bug
> Components: Core Java Framework
> Affects Versions: 2.6.0SDK
> Reporter: Marshall Schor
> Assignee: Marshall Schor
> Fix For: 2.7.0SDK
>
>
> Delta CAS serialization includes attempting to handle changes made by a
> remote "below the line" - changing existing FSs. If the remote removes the
> FSs and re-adds them back to the indices, this fact is sent back with the
> delta CAS as a list of FSs that are "re-indexed". The deserialization takes
> this list, and removes them, and re-adds them back to the indices in an
> attempt to get them properly indexed. However, this doesn't reliably work,
> if any of the keys used in the indexing were updated.
> Furthermore, there is nothing that insures a remote remove/add FSs it is
> modifying. Perhaps the remote is unaware that FSs of that type are even
> being indexed (the information about what indices are defined at the client
> is not sent to the remote).
> So, it is possible that the remote could modify some feature, which is used
> as a key by the client. The Deserialization would update that feature,
> without removing the FS from the indices first.
> One conclusion - the list of "re-indexed" FSs returned by the remote during a
> delta serialization is not a reliable indication of which FSs would need
> reindexing, since the index definitions between the client and the remote
> could be different.
> A proper fix would be to check if an existing feature being updated was used
> as a key in any index at the client, before updating it, and remove it before
> updating the feature, and then add it back.
> This would require, if Delta CAS was being used, setting up the mechanisms
> proposed in UIMA-4059 to see if when deserializing some changes to existing
> FSs, whether or not changes were being made to features that were being used
> as keys in any index *and* the FS being modified was "added-to-indexes".
> It's likely that all forms of delta deserialization (XMI and Binary and
> Compressed Binary) probably have this issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)