Hi,

I am looking for a way to improve loading times in an application, so I did a 
little experiment with binary CAS serialization to see if it was superior to 
XMI serialization. For serialization I used the CASCompleteSerializer to 
serialize the type-system and heaps into the same file using Java object 
serialization - at least that is what I understood it should do. To read in 
these files, I would deserialize the CASCompleteSerializer and initialize a CAS 
from it using CASImpl.reinit().

96.400 files

plain text (uncompressed)      :                 581.865.593 Byte
binary (serialized java, gzip) : 0:47:02.835   3.555.449.597 Byte 
xmi (gzip)                     : 1:20:31.535   4.712.633.769 Byte

So binary takes about 60% of the time xmi serialization would need and uses 
about 75% of the space.
I didn't do reading experiment yet, but I suppose the improvement should be on 
a similar level, if not better.

I am also not sure yet about the draw-backs of binary serialization and in 
which scenarios they apply. The draw-backs I saw so far are:

- Type-system is stored redudantly in every output file.
- The type system configured with CASImpl.reinit() may be different from the 
one which was used to initialize the pipeline, CAS-based annotators relying on 
typeSystemInit() may not be configured with the correct types - this is a 
hypothesis I didn't test.
- Serialized Java objects may become due to refactoring within the UIMA 
framework. However, there is yet another binary CAS serialization in UIMA which 
uses the DataOutputStream and may be more stable.

Did anybody ever use any form of binary CAS serialization outside Vinci/UIMA-AS?

Cheers,

-- Richard

-- 
------------------------------------------------------------------- 
Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab (UKP-TUD) 
FB 20 Computer Science Department      
Technische Universität Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
[email protected] 
www.ukp.tu-darmstadt.de 
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
------------------------------------------------------------------- 






Reply via email to