[ 
https://issues.apache.org/jira/browse/UIMA-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marshall Schor updated UIMA-2498:
---------------------------------

    Description: 
Extend the binary compressed serialization to support cases where the type 
systems are not exactly the same.

There are 2 use cases.
First: the source is a previously saved file. The goal is to deserialize it 
into e.g. a tool, where the type system in the tool may be somewhat different 
than the type system used to create the file. (For instance, it may be at a 
different version level).

Second: the source is a client for a UIMA-AS service.  In this case, the client 
has read the service's type system, and has merged it with its own.

Difference in the type systems could be:
Type exists in one, not in the other; 
Type exists in both, but with different features (including those from super 
types).  Features could be added/subtracted.  Features could have different 
ranges (incompatible ranges should cause error messages).

A suggested impl approach: create a mapper that maps typecodes and feature 
codes; set it up by comparing two type systems.  For the first use case, 
implement a version of deserialization that takes an extra input of the source 
type system, and creates the converter, and then does deserialization with the 
conversions.  For the 2nd use case, during initialization time, after the 
service's type system has been read (for merging into the client's type system 
definition), use this to create the same mappper between type codes / feature 
codes; when sending a CAS via binary serialization, send it via the mapping 
converter for type codes and feature codes. 

Try to arrange things so that the creation of the mapper can be done once per 
"set" of CASes, rather than once per CAS.

Note that managing out-of-type-system data is superseded by support for 
delta-CAS formats.


  was:
Extend the binary compressed serialization to support cases where the type 
systems are not exactly the same.

There are 2 use cases.
First: the source is a previously saved file. The goal is to deserialize it 
into e.g. a tool, where the type system in the tool may be somewhat different 
than the type system used to create the file. (For instance, it may be at a 
different version level).

Second: the source is a client for a UIMA-AS service.  In this case, the client 
has read the service's type system, and has merged it with its own.

Difference in the type systems could be:
Type exists in one, not in the other; 
Type exists in both, but with different features (including those from super 
types).  Features could be added/subtracted.  Features could have different 
ranges (incompatible ranges should cause error messages).

A suggested impl approach: create a mapper that maps typecodes and feature 
codes; set it up by comparing two type systems.  For the first use case, 
implement a version of deserialization that takes an extra input of the source 
type system, and creates the converter, and then does deserialization with the 
conversions.  For the 2nd use case, during initialization time, after the 
service's type system has been read (for merging into the client's type system 
definition), use this to create the same mappper between type codes / feature 
codes; when sending a CAS via binary serialization, send it via the mapping 
converter for type codes and feature codes. 

Try to arrange things so that the creation of the mapper can be done once per 
"set" of CASes, rather than once per CAS.


    
> add lenient version for binary compressed serialization/deserialization
> -----------------------------------------------------------------------
>
>                 Key: UIMA-2498
>                 URL: https://issues.apache.org/jira/browse/UIMA-2498
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Core Java Framework
>    Affects Versions: 2.4.0SDK
>            Reporter: Marshall Schor
>            Assignee: Marshall Schor
>            Priority: Minor
>             Fix For: 2.4.1SDK
>
>
> Extend the binary compressed serialization to support cases where the type 
> systems are not exactly the same.
> There are 2 use cases.
> First: the source is a previously saved file. The goal is to deserialize it 
> into e.g. a tool, where the type system in the tool may be somewhat different 
> than the type system used to create the file. (For instance, it may be at a 
> different version level).
> Second: the source is a client for a UIMA-AS service.  In this case, the 
> client has read the service's type system, and has merged it with its own.
> Difference in the type systems could be:
> Type exists in one, not in the other; 
> Type exists in both, but with different features (including those from super 
> types).  Features could be added/subtracted.  Features could have different 
> ranges (incompatible ranges should cause error messages).
> A suggested impl approach: create a mapper that maps typecodes and feature 
> codes; set it up by comparing two type systems.  For the first use case, 
> implement a version of deserialization that takes an extra input of the 
> source type system, and creates the converter, and then does deserialization 
> with the conversions.  For the 2nd use case, during initialization time, 
> after the service's type system has been read (for merging into the client's 
> type system definition), use this to create the same mappper between type 
> codes / feature codes; when sending a CAS via binary serialization, send it 
> via the mapping converter for type codes and feature codes. 
> Try to arrange things so that the creation of the mapper can be done once per 
> "set" of CASes, rather than once per CAS.
> Note that managing out-of-type-system data is superseded by support for 
> delta-CAS formats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to