Author: schor
Date: Fri May 20 15:14:26 2016
New Revision: 1744753
URL: http://svn.apache.org/viewvc?rev=1744753&view=rev
Log:
no Jira - add table consolidating useful comparative information about the
alternative CAS Serialization capabilities
Modified:
uima/uimaj/trunk/uima-docbook-tutorials-and-users-guides/src/docbook/tug.application.xml
Modified:
uima/uimaj/trunk/uima-docbook-tutorials-and-users-guides/src/docbook/tug.application.xml
URL:
http://svn.apache.org/viewvc/uima/uimaj/trunk/uima-docbook-tutorials-and-users-guides/src/docbook/tug.application.xml?rev=1744753&r1=1744752&r2=1744753&view=diff
==============================================================================
---
uima/uimaj/trunk/uima-docbook-tutorials-and-users-guides/src/docbook/tug.application.xml
(original)
+++
uima/uimaj/trunk/uima-docbook-tutorials-and-users-guides/src/docbook/tug.application.xml
Fri May 20 15:14:26 2016
@@ -485,17 +485,21 @@ ae.destroy();</programlisting></para>
<title>Saving CASes to file systems or general Streams</title>
<para>The UIMA framework provides multiple APIs to save and restore the
contents of a CAS to streams.
+ Two common uses of this are to save CASes to the file system, and to
send CASes to other processes, running
+ on remote systems.</para>
+
+ <para>
The CASes can be serialized in multiple formats:
<itemizedlist>
<listitem>
<para>Binary formats:
<itemizedlist>
<listitem>
- <para>plain binary: This is used to communicate with remote
services, and also for interfacing with
+ <para>plain binary: This is used to communicate with remote
services, and also for interfacing with
annotators written in C/C++ or related languages via the JNI
Java interface, from Java</para>
</listitem>
<listitem>
- <para>Two forms of compressed binary. The recommend one is
form 6, which also allows
+ <para>Compressed binary: There are two forms of compressed
binary. The recommend one is form 6, which also allows
type filtering. See <olink targetdoc="&uima_docs_ref;"
targetptr="ugr.ref.compress.overview"/>.</para>
</listitem>
</itemizedlist>
@@ -515,6 +519,141 @@ ae.destroy();</programlisting></para>
</itemizedlist>
</para>
+ <para>Each of these serializations has different capabilities,
summarized in the table below.
+ <table frame="all" id="ugr.tug.tbl.serialization_capabilities">
+ <title>Serialization Capabilities</title>
+ <tgroup cols="7" rowsep="1" colsep="1">
+ <colspec colname="c1"/>
+ <colspec colname="c2"/>
+ <colspec colname="c3"/>
+ <colspec colname="c4"/>
+ <colspec colname="c5"/>
+ <colspec colname="c6"/>
+ <colspec colname="c7"/>
+ <thead>
+ <row>
+ <entry align="center"></entry>
+ <entry align="center">XCAS</entry>
+ <entry align="center">XMI</entry>
+ <entry align="center">JSON</entry>
+ <entry align="center">Binary</entry>
+ <entry align="center">Cmpr 4</entry>
+ <entry align="center">Cmrp 6</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>Output</entry>
+ <entry>Output Stream</entry>
+ <entry>Output Stream</entry>
+ <entry>Output Stream, File, Writer</entry>
+ <entry>Output Stream</entry>
+ <entry>Output Stream, Data Output Stream, File</entry>
+ <entry>Output Stream, Data Output Stream, File</entry>
+ </row>
+ <row>
+ <entry>Lists/Arrays inline formatting?</entry>
+ <entry>-</entry>
+ <entry>Yes</entry>
+ <entry>Yes</entry>
+ <entry>-</entry>
+ <entry>-</entry>
+ <entry>-</entry>
+ </row>
+ <row>
+ <entry>Formatted?</entry>
+ <entry>-</entry>
+ <entry>Yes</entry>
+ <entry>Yes</entry>
+ <entry>-</entry>
+ <entry>-</entry>
+ <entry>-</entry>
+ </row>
+ <row>
+ <entry>Type Filtering?</entry>
+ <entry>-</entry>
+ <entry>Yes</entry>
+ <entry>Yes</entry>
+ <entry>-</entry>
+ <entry>-</entry>
+ <entry>Yes</entry>
+ </row>
+ <row>
+ <entry>Delta Cas?</entry>
+ <entry>-</entry>
+ <entry>Yes</entry>
+ <entry>-</entry>
+ <entry>Yes</entry>
+ <entry>Yes</entry>
+ <entry>Yes</entry>
+ </row>
+ <row>
+ <entry>OOTS?</entry>
+ <entry>Yes</entry>
+ <entry>Yes</entry>
+ <entry>-</entry>
+ <entry>-</entry>
+ <entry>-</entry>
+ <entry>-</entry>
+ </row>
+ <row>
+ <entry>Only send indexed + reachable FSs?</entry>
+ <entry>Yes</entry>
+ <entry>Yes</entry>
+ <entry>Yes</entry>
+ <entry>send all</entry>
+ <entry>send all</entry>
+ <entry>Yes</entry>
+ </row>
+ <row>
+ <entry>NameSpace/Schemas?</entry>
+ <entry>-</entry>
+ <entry>Yes</entry>
+ <entry>-</entry>
+ <entry>-</entry>
+ <entry>-</entry>
+ <entry>-</entry>
+ </row>
+ </tbody>
+ </tgroup>
+
+ </table>
+ </para>
+
+ <para>In the above table, Cmpr 4 and Cmpr 6 refer to Compressed forms of
the serialization.</para>
+
+ <para>For the XMI and JSON formats, lists and arrays can sometimes be
formatted "inline".
+ In this representation, the elements are formatted directly as the value
of a particular
+ feature. This is only done if the arrays and lists are not
multiply-referenced.</para>
+
+ <para>Type Filtering support enables only a subset of the types and/or
features to be
+ serialized. An additional type system object is used to specify the
types to be included
+ in the serialization. This can be useful, for instance, when sending a
CAS to a remote service,
+ where the remote service only uses a small number of the types and
features, to reduce the size
+ of the serialized CAS.</para>
+
+ <para>Delta Cas support makes use of a "mark" set in the CAS, and only
serializes changes in the CAS,
+ both new and modified Feature Structures, that were added or changed
after the mark was set.
+ This is useful for remote services, supporting the use-case where a
large CAS is sent to the service,
+ which sets the mark in the received CAS, and then adds a small amount of
information;
+ the Delta CAS then serializes only that small amount as the "reply" sent
back to the sender.</para>
+
+ <para>OOTS means "Out of Type System" support, intended to support the
use-case where a CAS is being sent
+ to a remote application. This supports deserializing an incoming CAS
where
+ some of the types and/or features may not be present in the receiving
CAS's type system. A "lenient"
+ option on the deserialization permits the deserialization to proceed,
with the out-of-type-system
+ information preserved so that when the CAS is subsequently reserialized
(in the use-case, to be
+ returned back to the sender), the out-of-type-system information is
re-merged back into the output stream.
+ </para>
+
+ <para>The Binary and Compressed Form 4 serializations send all the
Feature Structures in the CAS,
+ in the order they were created in the CAS. The other methods only
+ send Feature Structures that are reachable, either by
+ their being in some CAS index, or being referenced
+ as a feature of another Feature Structure which is reachable.</para>
+
+ <para>The NameSpace/Schema support allows specifying a set of schemas,
each one corresponding to a particular
+ namespace, used in XMI serialization.</para>
<para>To save an XMI representation of a CAS, use the
<literal>serialize</literal> method of the class
<literal>org.apache.uima.util.XmlCasSerializer</literal>. To save an
XCAS representation of a CAS,
use the class
<literal>org.apache.uima.cas.impl.XCASSerializer</literal> instead; see the
Javadocs