Marshall Schor created UIMA-4820:
------------------------------------

             Summary: uv3 Supporting Delta deserialization requires holding on 
to FSs serialized
                 Key: UIMA-4820
                 URL: https://issues.apache.org/jira/browse/UIMA-4820
             Project: UIMA
          Issue Type: Bug
          Components: Core Java Framework
            Reporter: Marshall Schor
            Assignee: Marshall Schor
             Fix For: 3.0.0SDKexp


UIMA supports various formats of delta deserialization, which is when a 
serialization is done (to, for example, a remote service), and then a delta 
serialization returns just the changes back to the original CAS.  

There are two approaches used to get the set of FSs to serialize.  
* One way, used for plain binary and form4 compressed, scans the "heap" 
sequentially, and sends all those FSs, including potentially FSs that are not 
"reachable".  
* The other way is to use the indexes plus following reference chains to locate 
all "reachable" FSs, and only send those.  This is used for XCAS, XMI, JSON, 
and Form6 compressed.

In V3, the plain and form4 serialization need to preserve simulated heap 
"addresses" (per CAS) for the FSs sent in order to enable future delta 
deserializations to have the proper "heap" addresses; it may not recalcuate 
this from the CAS FS contents, because intervening GCs may have garbage 
collected some unreachable FSs..  

Furthermore, plain and form4 non-delta deserialization where a delta 
serialization is to follow, must likewise preserve these simulated heap 
addresses (per CAS), for all deserialized FSs.

This preservation is needed to insure that the simulated "addresses" of FSs are 
constant, even if unreachable FSs are reclaimed.  In practice, this means that 
various maps involving simulated heap "addresses" need to be retained and not 
recreated.

Because they are retained, their storage needs to be released when no longer 
needed:  at CAS Reset time, after a services delta deserializer has completed 
deserializing (potentially multiple) delta CASes, or when a new non-delta 
serialization is started (this will re-create this storage).  For services use, 
we may add a new API to release this storage; the service would call it after 
all delta deserializations for this CAS have been received (this use case is 
supporting having multiple remotes working on a common CAS and having their 
delta results merged back into the original CAS).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to