Hi Richard, First of all, my NullPointerException was because I was deserializing from an XMI and not from a binary serialization, so I switched to the binary one.
But then I got a new problem - I got a ClassCastException when casting the return value of ll_getFSForRef(address) to SentenceAnnotation. This is because the returned value was of type AnnotationImpl, which is not in the inheritance hierarchy of Annotation (a bit misleading, but OK). And this only happened with a deserialized JCas, a normal one created in memory indeed returned a valid Annotation. This was solved by using your suggestion and calling jcas.getCAS().getJCas() once after deserializing the JCas (and before trying to resolve all addresses). I looked a little into it, and it seems that the problem lies here: I create an empty JCas just before deserializing into it, using ae.newJCas(), where ae is an AnalysisEngine object. The CAS created for the new JCas is for some reason *different* from its svd.baseCAS. This has the effect that deserializeCASComplete (or actually CASImpl.reinit) eventually uses the svd.baseCAS for deserializing into, explicitly putting null in the jcas. This null means I must afterwards call jcas.getCAS().getJCas(), otherwise the jcas is somewhat broken. I don't know if this is intentional behavior or some edge case bug, and if this happens because of the way I create the JCas and deserialized. But this is definitely pretty weird behavior, which may be worth some consideration. Ofer On Mon, May 19, 2014 at 5:33 PM, Richard Eckart de Castilho <[email protected]>wrote: > Hi Ofer, > > I can tell you that in WebAnno (non-ASF) we use the > CASCompleteSerializer to persist the CAS and we use the addresses of > annotations across de/serialization cycles to refer to annotations. > We found this to work reliably (with stable addresses) whereas other > forms of serialization, e.g. the compressed binary formats, do not > maintain stable CAS addresses. > > It sounds a bit as if the JCas structures may not have been > properly set up yet. Maybe try calling something like cas.getJCas() or > even jcas.getCAS().getJCas() before trying to resolve the > address against the CAS. > > Cheers, > > -- Richard > > On 19.05.2014, at 16:14, Ofer Bronstein <[email protected]> wrote: > > > Hi Richard and all, > > > > Thank you for the idea. I tried using your idea with ll_getFSForRef(), > but > > I get a NullPointerException: > > In CASImpl.ll_getFSForRef(int fsRef), in the last line of the method > (line > > 3117), the expression this.svd.localFsGenerators[getHeap().heap[fsRef]] > > returns null, but since the full phrase > > is this.svd.localFsGenerators[getHeap().heap[fsRef]].createFS(fsRef, > this), > > we get a NullPointerException since we're trying to call createFS(fsRef, > > this) on null. > > > > The address I am using is definitely on a Sentence Annotation that exists > > in the CAS, in the _InitialView, and I got the address by calling > > getAddress() on it and saving the Integer. > > Can you think of any reason why this happens? Or, should I do something > > special to make the address valid, or have the FeatureStructure > retrievable > > from it? > > > > Thank you, > > Ofer > > > > > > On Mon, May 19, 2014 at 1:24 PM, Richard Eckart de Castilho > > <[email protected]>wrote: > > > >> Hi Ofer, > >> > >> I'm not an expert on Java Serialization but here is goes nothing ;) > >> > >> 1) I suppose you could override the default Java Serialization process > for > >> your Document class and handle the de/serialization of the CAS via > >> the CASCompleteSerializer - that would basically be the special > treatment. > >> > >> 2) I do not think that you can make JCas objects (like > SentenceAnnotation) > >> "survive" the serialization process because they are not serializable. > >> If you manage to de/serialize the CAS using CASCompleteSerializer, then > >> you can make use of the CAS addresses in each annotation. Your Sentence > >> object can maintain a reference to the address of each > SentenceAnnotation. > >> When you want to access the SentenceAnnotation through your Sentence, > >> you do so by resolving the address against the loaded JCas: > >> > >> (Store this address in your Sentence) > >> int address = sentenceAnnotation.getAddress() > >> > >> (Use it later after deserialization to fetch the SentenceAnnotation from > >> the JCas) > >> (SentenceAnnotation) aJCas.getLowLevelCas().ll_getFSForRef(address) > >> > >> Btw. this is as fast as it gets - JCas wrappers use such code > internally. > >> > >> I'd say what you plan to do should work but it verges on the border of > >> black magic! But then again, I've done similar stuff ;) > >> > >> Cheers, > >> > >> -- Richard > >> > >> In your Document object, make the CAS a > >> > >> On 19.05.2014, at 12:04, Ofer Bronstein <[email protected]> wrote: > >> > >>> Hi Richard and all, > >>> > >>> Thank you for your answer. This is still only a partial solution, as: > >>> > >>> 1. The JCas is referenced from inside a Document object, and by your > >>> suggestion, I must serialize both of them separately. For instance, > write > >>> it alternating: <Document, JCas, Document, JCas, ...>, or implement > >>> Serializable.writeObject() and call > >>> ObjectOutputStream.defaultWriteObject() for the other fields. However, > I > >> am > >>> looking for a way to have the serializer of the document just go > through > >>> its default writeObject() implementation, and only when it encounters > the > >>> JCas field - then some special treatment would be triggered. > >>> > >>> 2. More importantly - my Sentence object (referenced by a Document > >> object) > >>> has a reference to a Sentence Annotation. This Annotation cannot be > >>> serialized by the method you suggest, as it only takes a full CAS. Of > >>> course I could implement here something that when deserializing, I > would > >>> iterate through the CAS and find each sentence's annotation and > manually > >>> put its reference in the Sentence object. But this is pretty > complicated, > >>> and would be a very lengthy process during deserialization. So I am > >> looking > >>> for a way for the SentenceAnnotation references to "survive" the > >>> serialization\deserialization. > >>> > >>> Do you have any ideas? > >>> > >>> Thank you, > >>> Ofer > >>> > >>> > >>> On Mon, May 19, 2014 at 12:19 PM, Richard Eckart de Castilho < > >> [email protected] > >>>> wrote: > >>> > >>>> Hello Ofer, > >>>> > >>>> the CAS cannot be serialized immediately, but there is a helper class > >>>> which is serializable. > >>>> > >>>> To write: > >>>> > >>>> ObjectOutputStream docOS = ... > >>>> CASCompleteSerializer serializer = > >>>> Serialization.serializeCASComplete(aJCas.getCasImpl()); > >>>> docOS.writeObject(serializer); > >>>> > >>>> To read: > >>>> > >>>> ObjectInputStream is = ... > >>>> CASCompleteSerializer serializer = (CASCompleteSerializer) > >> is.readObject(); > >>>> Serialization.deserializeCASComplete(serializer, (CASImpl) aCAS); > >>>> > >>>> However, there are newer and more efficient binary formats that you > >> might > >>>> want to use [1]. > >>>> > >>>> If you want to dig into the topic or if you want to use a ready-made > >> pair > >>>> of > >>>> readers/writers for the binary formats, you could consider taking a > >> look at > >>>> the BinaryCasReader/Writer in the DKPro Core [2,3] (non-ASF). > >>>> > >>>> Cheers, > >>>> > >>>> -- Richard > >>>> > >>>> [1] > >>>> > >> > http://uima.apache.org/d/uimaj-2.6.0/tutorials_and_users_guides.html#ugr.tug.type_filtering.compressed_file > >>>> [2] > >>>> > >> > https://code.google.com/p/dkpro-core-asl/source/browse/de.tudarmstadt.ukp.dkpro.core-asl/trunk/de.tudarmstadt.ukp.dkpro.core.io.bincas-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/bincas/BinaryCasReader.java > >>>> [3] > >>>> > >> > https://code.google.com/p/dkpro-core-asl/source/browse/de.tudarmstadt.ukp.dkpro.core-asl/trunk/de.tudarmstadt.ukp.dkpro.core.io.bincas-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/bincas/BinaryCasWriter.java > >>>> > >>>> On 19.05.2014, at 11:03, Ofer Bronstein <[email protected]> wrote: > >>>> > >>>>> Hi Guys, > >>>>> > >>>>> I am an Israeli Master's Student, and have been happily working with > >> UIMA > >>>>> for the past two years. > >>>>> I hope this is the right place for my question - > >>>>> > >>>>> I have a Document object I created, which has a JCas member with > >>>>> annotations over a document. > >>>>> I also have a Sentence object, with a member referencing its Sentence > >>>>> Annotation in the corresponding JCas. Each Document object references > >> all > >>>>> of its Sentence objects. > >>>>> I would like to dump each Document object as a file on disk, using > the > >>>>> default Java serialization. Later they would also be deserialized > back > >>>> into > >>>>> the Java objects. I understand I would need some special treatment > for > >>>> the > >>>>> JCases and the Sentence Annotations as they are not serializable > (now I > >>>> get > >>>>> NotSerializableException). Hopefully the treatment could be as > minimal > >> as > >>>>> possible. > >>>>> > >>>>> How do you suggest to do this, regarding serialization of JCas and > >>>>> combining it with Java serialization? > >>>>> > >>>>> I am working on Windows, with Java 1.6 and UIMA 2.4.0. I am using the > >>>> same > >>>>> type system and the same 3 views for all JCases and annotations. > >>>>> > >>>>> Thank you, > >>>>> Ofer Bronstein >
