On 07.01.2016, at 15:12, Marshall Schor <[email protected]> wrote: > > Thanks for explaining this "use case". > > I was a bit unclear on the two instances of deserialization time. > One (the 70%) was xmi, the other (2%) was S+. From reading the email chain, > it > seems S+ is the "CasCompleteSerializer". This switches to plain binary mode. > So you would avoid the XML parsing overhead. > > But I think both deserializations would have the same issue around > "allow_dups" > if that was where the substantial part of the slowdown was being spent, since > both would add all those annotations to the index. Perhaps that was another > use > case though... Am I mixing these up?
My understanding is that the CasCompleteSerializer is (de)serializing the heap structures and indexes as-is. So on loading, FSes are not passing through addToIndexes() and allow_dups at all. This should be what makes the S and S+ faster than the other approaches that call addToIndexes(). Btw. Cas(Complete)Serializer also has the nice effect that the addresses of FSes remain fully stable as even unindexed/unreachable FSes are stored and loaded. I think all other serializers drop unreachable which can cause addresses to change. I have a usecase in WebAnno where I'm absolutely relying on the stable addresses provided by the Cas(Complete)Serializer. Cheers, -- Richard
