Marshall Schor created UIMA-4820:
------------------------------------
Summary: uv3 Supporting Delta deserialization requires holding on
to FSs serialized
Key: UIMA-4820
URL: https://issues.apache.org/jira/browse/UIMA-4820
Project: UIMA
Issue Type: Bug
Components: Core Java Framework
Reporter: Marshall Schor
Assignee: Marshall Schor
Fix For: 3.0.0SDKexp
UIMA supports various formats of delta deserialization, which is when a
serialization is done (to, for example, a remote service), and then a delta
serialization returns just the changes back to the original CAS.
There are two approaches used to get the set of FSs to serialize.
* One way, used for plain binary and form4 compressed, scans the "heap"
sequentially, and sends all those FSs, including potentially FSs that are not
"reachable".
* The other way is to use the indexes plus following reference chains to locate
all "reachable" FSs, and only send those. This is used for XCAS, XMI, JSON,
and Form6 compressed.
In V3, the plain and form4 serialization need to preserve simulated heap
"addresses" (per CAS) for the FSs sent in order to enable future delta
deserializations to have the proper "heap" addresses; it may not recalcuate
this from the CAS FS contents, because intervening GCs may have garbage
collected some unreachable FSs..
Furthermore, plain and form4 non-delta deserialization where a delta
serialization is to follow, must likewise preserve these simulated heap
addresses (per CAS), for all deserialized FSs.
This preservation is needed to insure that the simulated "addresses" of FSs are
constant, even if unreachable FSs are reclaimed. In practice, this means that
various maps involving simulated heap "addresses" need to be retained and not
recreated.
Because they are retained, their storage needs to be released when no longer
needed: at CAS Reset time, after a services delta deserializer has completed
deserializing (potentially multiple) delta CASes, or when a new non-delta
serialization is started (this will re-create this storage). For services use,
we may add a new API to release this storage; the service would call it after
all delta deserializations for this CAS have been received (this use case is
supporting having multiple remotes working on a common CAS and having their
delta results merged back into the original CAS).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)