Good question.

UIMA today supports serialization for multiple use cases, including:
  - CAS storage on disk
  - communication with other (sub) systems
  - backwards compatibility with previous serialized forms
  - "filtering" for type/feature subsetting

The formats for these include various aspects:
  - custom compression
  - delta cas information (additional information re: what was changed,
including adds/removes to indexes)

The external formats include XCAS, XMI, 3 styles of binary (one of which
supports the JNI bridge to C/C++ annotators), and JSON (for serialization,
deserialization not done).  There's also one that uses SOAP/Vinci.

The current Apache v2 licensed open source approaches to serialization have
become quite sophisticated; Kryo for instance, can use user-written serializers,
but it also comes with its own set of default serializers for the popular Java
objects; for other Java objects it has the ability to custom generate (using a
byte-code generator) serialization code that is comparable to hand written
serialization (according to their docs - I haven't tried it myself, yet).

Keeping in mind UIMA's overall purpose in supporting / encouraging community
development of reusable assets for unstructured information processing, and the
various use-cases for serialization, I'm currently thinking it would be
beneficial to restrict the serialization formats to essentially one style.  The
downside of this being inefficient might be mitigated by the current advanced
approaches to serialization.

This whole area (of serialization / deserialization) could use more thought. 
The new frameworks are using fancy NIO approaches that UIMA could benefit from,
as well.

-Marshall 

On 10/29/2015 3:29 AM, Richard Eckart de Castilho (JIRA) wrote:
> ... <snip> ...
> Richard Eckart de Castilho commented on UIMA-4668:
> --------------------------------------------------
>
> Is UIMA responsible for the serialization or could the objects themselves do 
> anything about it, for example be Serializable or Externalizable?
> ... <snip> ...

Reply via email to