Looking at the code in CASImpl, I see (line 112) cas.impl.Serialization: public static void deserializeCASComplete(CASCompleteSerializer casCompSer ...) -> (line 1231) reinit(CASCompleteSerializer....) calls (line 1260) (line 1293) reinit( a bunch of arrays, including int[] fsIndex which is an array of things to add to indexes) calls (line 1307) (line 1730) reinitIndexedFSs(int[] fsIndex) which has a double loop - outer for views, inner for all indexed fs where it does (line 1770) addFS(fsIndex[i])
which does a one element add of the feature structure to the indexes. Perhaps though I'm following the wrong code path ... -Marshall On 1/7/2016 10:36 AM, Richard Eckart de Castilho wrote: > On 07.01.2016, at 15:12, Marshall Schor <[email protected]> wrote: >> Thanks for explaining this "use case". >> >> I was a bit unclear on the two instances of deserialization time. >> One (the 70%) was xmi, the other (2%) was S+. From reading the email chain, >> it >> seems S+ is the "CasCompleteSerializer". This switches to plain binary >> mode. >> So you would avoid the XML parsing overhead. >> >> But I think both deserializations would have the same issue around >> "allow_dups" >> if that was where the substantial part of the slowdown was being spent, since >> both would add all those annotations to the index. Perhaps that was another >> use >> case though... Am I mixing these up? > My understanding is that the CasCompleteSerializer is (de)serializing the heap > structures and indexes as-is. So on loading, FSes are not passing through > addToIndexes() and allow_dups at all. This should be what makes the S and S+ > faster than the other approaches that call addToIndexes(). > > Btw. Cas(Complete)Serializer also has the nice effect that the addresses of > FSes remain fully stable as even unindexed/unreachable FSes are stored and > loaded. I think all other serializers drop unreachable which can cause > addresses > to change. I have a usecase in WebAnno where I'm absolutely relying on the > stable addresses provided by the Cas(Complete)Serializer. > > Cheers, > > -- Richard
