Good question. UIMA today supports serialization for multiple use cases, including: - CAS storage on disk - communication with other (sub) systems - backwards compatibility with previous serialized forms - "filtering" for type/feature subsetting
The formats for these include various aspects: - custom compression - delta cas information (additional information re: what was changed, including adds/removes to indexes) The external formats include XCAS, XMI, 3 styles of binary (one of which supports the JNI bridge to C/C++ annotators), and JSON (for serialization, deserialization not done). There's also one that uses SOAP/Vinci. The current Apache v2 licensed open source approaches to serialization have become quite sophisticated; Kryo for instance, can use user-written serializers, but it also comes with its own set of default serializers for the popular Java objects; for other Java objects it has the ability to custom generate (using a byte-code generator) serialization code that is comparable to hand written serialization (according to their docs - I haven't tried it myself, yet). Keeping in mind UIMA's overall purpose in supporting / encouraging community development of reusable assets for unstructured information processing, and the various use-cases for serialization, I'm currently thinking it would be beneficial to restrict the serialization formats to essentially one style. The downside of this being inefficient might be mitigated by the current advanced approaches to serialization. This whole area (of serialization / deserialization) could use more thought. The new frameworks are using fancy NIO approaches that UIMA could benefit from, as well. -Marshall On 10/29/2015 3:29 AM, Richard Eckart de Castilho (JIRA) wrote: > ... <snip> ... > Richard Eckart de Castilho commented on UIMA-4668: > -------------------------------------------------- > > Is UIMA responsible for the serialization or could the objects themselves do > anything about it, for example be Serializable or Externalizable? > ... <snip> ...
