Am 08.07.2013 um 23:49 schrieb Marshall Schor <[email protected]>: >> The documentation says: >> >>> Deserialize with type filtering: >>> >>> The reuseInfo should be null unless deserializing a delta CAS, in which >>> case, it must be the reuse info captured when the original CAS was >>> serialized out. If the target type system is identical to the one in the >>> CAS, you may pass null for it. If a delta cas is not being received, you >>> must pass null for the reuseInfo. >>> >>> Serialization.deserializeCAS(cas, bais, tgtTypeSystem, reuseInfo); >> So I assume that when I deserialize my persisted CAS into a fresh one which >> doesn't contain any types, the only thing that should arrive is the SofA. >> But, no matter what serialization format I use (0, 4, or 6), I always get an >> ArrayIndexOutOfBoundsException. >> >> I create the target CAS like this: >> >> CAS cas = CasCreationUtils.createCas((TypeSystemDescription) null, >> null, null); >> >> Format 6: >> >> java.lang.ArrayIndexOutOfBoundsException: 37 >> at >> org.apache.uima.cas.impl.TypeSystemImpl.getTypeInfo(TypeSystemImpl.java:1566) >> at >> org.apache.uima.cas.impl.BinaryCasSerDes6.deserializeAfterVersion(BinaryCasSerDes6.java:1701) >> at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1203) >> at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1168) >> at >> org.apache.uima.cas.impl.Serialization.deserializeCAS(Serialization.java:171) >> … >> >> Am I misunderstanding how the (de)serialization is supposed to work? > > Form 6 supports having different type systems. When using this, it expects > the > "other" type system to be passed in, as a type system impl object. If "null" > is > passed in, then it assumes the "other" type system is identical to the first > one. (this is what the JavaDocs mean, when it says: > > If the target type system is identical to the one in the CAS, you may pass > null for it.
In the sentence above, I assumed that "CAS" means "the which I deserialize into/the target CAS" and that "target type system is identical" means that "I want all types available in the target CAS to be deserialized/I do not want any types that are available in the target CAS to be ignored". > So, to make form 6 work for you, you have to do something like: > > a) Create an instance of a type system impl for the types in your serialized > form. > For instance, if you created a CAS with some types in it, and serialized it, > before you get rid of that CAS, save its type system in a variable: > > TypeSystem tsThatWasSerialized = theCASthatWasSerialized.getTypeSystem(); > > Use this type system as the argument, (not "null") when calling the form 6 > style deserialize: > > Serialization.deserializeCAS(cas, bais, tsThatWasSerialized, null); > > Is that something like what you did? Nope, that's not what I did. I thought it was not necessary to preserve the "source" type system. I interpreted the documentation such that "tsThatWasSerialized" was not the "source" type system, but the "target" type system (e.g. a subset of the actual target CAS type system). Ignoring the potential waste of space, wouldn't you find it useful to serialize all used types of the type system as part of the format 6, thus avoiding to have to maintain an external copy of the type system? The CasCompleterSerializer conveniently wraps up all data (CAS + type system) in a single serializable object. I find that very convenient. The only annoying part is, that it's not possible to deserialize that into a CAS with a new type system, e.g. with some types added or removed. Btw. it might be nice if deserializeCas() could not only detect the formats 0, 4 and 6, but also serialized forms of the CasCompleterSerializer. Did you do any performance measures for the new serialization forms? -- Richard
