On 07.01.2016, at 15:12, Marshall Schor <[email protected]> wrote:
> 
> Thanks for explaining this "use case". 
> 
> I was a bit unclear on the two instances of deserialization time. 
> One (the 70%) was xmi, the other (2%) was S+.  From reading the email chain, 
> it
> seems S+ is the "CasCompleteSerializer".  This switches to plain binary mode. 
> So you would avoid the XML parsing overhead. 
> 
> But I think both deserializations would have the same issue around 
> "allow_dups"
> if that was where the substantial part of the slowdown was being spent, since
> both would add all those annotations to the index.  Perhaps that was another 
> use
> case though...  Am I mixing these up?

My understanding is that the CasCompleteSerializer is (de)serializing the heap
structures and indexes as-is. So on loading, FSes are not passing through
addToIndexes() and allow_dups at all. This should be what makes the S and S+
faster than the other approaches that call addToIndexes().

Btw. Cas(Complete)Serializer also has the nice effect that the addresses of 
FSes remain fully stable as even unindexed/unreachable FSes are stored and
loaded. I think all other serializers drop unreachable which can cause addresses
to change. I have a usecase in WebAnno where I'm absolutely relying on the
stable addresses provided by the Cas(Complete)Serializer.

Cheers,

-- Richard

Reply via email to