Re: CAS serialization performance: XMI vs. Java serialization

Marshall Schor Wed, 15 Aug 2012 08:21:38 -0700

As a side comment, in previous benchmarking I've done on other systems, I've
found that using memory mapped IO (part of Java NIO) can make a lot of 
difference.


Also, when we put in gzip we expected it to speed things up, but it actually
quite slowed things down.

-Marshall


On 8/15/2012 4:09 AM, Richard Eckart de Castilho wrote:
> Hi,
>
> I am looking for a way to improve loading times in an application, so I did a 
> little experiment with binary CAS serialization to see if it was superior to 
> XMI serialization. For serialization I used the CASCompleteSerializer to 
> serialize the type-system and heaps into the same file using Java object 
> serialization - at least that is what I understood it should do. To read in 
> these files, I would deserialize the CASCompleteSerializer and initialize a 
> CAS from it using CASImpl.reinit().
>
> 96.400 files
>
> plain text (uncompressed)      :                 581.865.593 Byte
> binary (serialized java, gzip) : 0:47:02.835   3.555.449.597 Byte 
> xmi (gzip)                     : 1:20:31.535   4.712.633.769 Byte
>
> So binary takes about 60% of the time xmi serialization would need and uses 
> about 75% of the space.
> I didn't do reading experiment yet, but I suppose the improvement should be 
> on a similar level, if not better.
>
> I am also not sure yet about the draw-backs of binary serialization and in 
> which scenarios they apply. The draw-backs I saw so far are:
>
> - Type-system is stored redudantly in every output file.
> - The type system configured with CASImpl.reinit() may be different from the 
> one which was used to initialize the pipeline, CAS-based annotators relying 
> on typeSystemInit() may not be configured with the correct types - this is a 
> hypothesis I didn't test.
> - Serialized Java objects may become due to refactoring within the UIMA 
> framework. However, there is yet another binary CAS serialization in UIMA 
> which uses the DataOutputStream and may be more stable.
>
> Did anybody ever use any form of binary CAS serialization outside 
> Vinci/UIMA-AS?
>
> Cheers,
>
> -- Richard
>

Re: CAS serialization performance: XMI vs. Java serialization

Reply via email to