Marshall Schor created UIMA-5164:
------------------------------------
Summary: uv3 support for arbitrary Java objects in the CAS
Key: UIMA-5164
URL: https://issues.apache.org/jira/browse/UIMA-5164
Project: UIMA
Issue Type: New Feature
Components: e
Reporter: Marshall Schor
Assignee: Marshall Schor
Priority: Minor
Fix For: 3.0.0SDKexp
Using JCas, it has been possible to have arbitrary Java objects included in the
Java instance. The problem with doing this has been that there was no
architected way for these objects to participate in the broader UIMA
interoperability concepts such as serialization, remote annotators, etc. And,
furthermore, JCas objects were optional, and might not be used.
UIMA V3 implements Feature Structures in the CAS as JCas objects directly, so
these are now always present and reliable. This means that when an
implementation adds arbitrary Java objects (e.g, a special HashSet containing
Feature Structures) to a JCas class definition, they are reliably present.
Here's how we could make this all work in v3.
A user would first pick some Java class to emulate in the CAS. A requirement
would be that the data in the emulated class would need to support having a
serialized form representing a "snapshot" of the data at a particular moment,
that could be put into the CAS using a fixed number of UIMA features of normal
UIMA data types, including Feature Structures. For example, an
ArrayList<FeatureStructure> could be put into the CAS as an FSArray instance
of the current size; a Map<Integer, FeatureStructure> could be put into the CAS
as an IntegerArray and an FSArray, etc. The snapshot would be produced
whenever needed, for example, during serialization. A corresponding
transformation (used, for instance, during deserialization) would convert the
snapshot data back into the emulated Java class instance.
This new kind of hybrid object would be implemented with a custom JCas cover
class which wrapped the emulated Java class instance. It would also have as
features those needed for the "snapshot" representation.
The user would need to
* define a UIMA type; this type would include the feature definitions needed
for the snapshot.
* create the corresponding JCas cover class for that type
* add 3 extra methods in the cover class, all methods defined by a new UIMA
interface "UimaSerializable"
** _init_from_cas_data()
** _save_to_cas_data()
** clone
The _init_from_cas_data would use the cas data in this Feature Structure to
initialize the emulated Java class.
This method would be called by the framework whenever it makes a new instance
with non-empty Feature Structure data (for example, during deserialization), so
that the emulated Java class instance may be initialized. This would typically
be called by routines like the cas copier and deserialization.
Similarly, the _save_to_cas_data would be called by the framework as part of
serialization, and would extract data from the emulated Java class and save as
CAS features..
This Jira adds support for this approach; other Jiras will add some likely
popular new types (example: FSArrayList - like ArrayList<TOP>). Users can
(easily ?) add types of their own, for instance, if they need a peculiar kind
of Set of Feature Structures, perhaps built on top of ConcurrentSkipListSet
using a special definition of set-member-equals.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)