Hi Richard,

First of all, my NullPointerException was because I was deserializing from
an XMI and not from a binary serialization, so I switched to the binary one.

But then I got a new problem - I got a ClassCastException when casting the
return value of ll_getFSForRef(address) to SentenceAnnotation. This is
because the returned value was of type AnnotationImpl, which is not in the
inheritance hierarchy of Annotation (a bit misleading, but OK). And this
only happened with a deserialized JCas, a normal one created in memory
indeed returned a valid Annotation. This was solved by using your
suggestion and calling jcas.getCAS().getJCas() once after deserializing the
JCas (and before trying to resolve all addresses).

I looked a little into it, and it seems that the problem lies here:
I create an empty JCas just before deserializing into it, using ae.newJCas(),
where ae is an AnalysisEngine object. The CAS created for the new JCas is
for some reason *different* from its svd.baseCAS. This has the effect that
deserializeCASComplete (or actually CASImpl.reinit) eventually uses the
svd.baseCAS for deserializing into, explicitly putting null in the jcas.
This null means I must afterwards call jcas.getCAS().getJCas(), otherwise
the jcas is somewhat broken.

I don't know if this is intentional behavior or some edge case bug, and if
this happens because of the way I create the JCas and deserialized. But
this is definitely pretty weird behavior, which may be worth some
consideration.

Ofer


On Mon, May 19, 2014 at 5:33 PM, Richard Eckart de Castilho
<[email protected]>wrote:

> Hi Ofer,
>
> I can tell you that in WebAnno (non-ASF) we use the
> CASCompleteSerializer to persist the CAS and we use the addresses of
> annotations across de/serialization cycles to refer to annotations.
> We found this to work reliably (with stable addresses) whereas other
> forms of serialization, e.g. the compressed binary formats, do not
> maintain stable CAS addresses.
>
> It sounds a bit as if the JCas structures may not have been
> properly set up yet. Maybe try calling something like cas.getJCas() or
> even jcas.getCAS().getJCas() before trying to resolve the
> address against the CAS.
>
> Cheers,
>
> -- Richard
>
> On 19.05.2014, at 16:14, Ofer Bronstein <[email protected]> wrote:
>
> > Hi Richard and all,
> >
> > Thank you for the idea. I tried using your idea with ll_getFSForRef(),
> but
> > I get a NullPointerException:
> > In CASImpl.ll_getFSForRef(int fsRef), in the last line of the method
> (line
> > 3117), the expression this.svd.localFsGenerators[getHeap().heap[fsRef]]
> > returns null, but since the full phrase
> > is this.svd.localFsGenerators[getHeap().heap[fsRef]].createFS(fsRef,
> this),
> > we get a NullPointerException since we're trying to call createFS(fsRef,
> > this) on null.
> >
> > The address I am using is definitely on a Sentence Annotation that exists
> > in the CAS, in the _InitialView,  and I got the address by calling
> > getAddress() on it and saving the Integer.
> > Can you think of any reason why this happens? Or, should I do something
> > special to make the address valid, or have the FeatureStructure
> retrievable
> > from it?
> >
> > Thank you,
> > Ofer
> >
> >
> > On Mon, May 19, 2014 at 1:24 PM, Richard Eckart de Castilho
> > <[email protected]>wrote:
> >
> >> Hi Ofer,
> >>
> >> I'm not an expert on Java Serialization but here is goes nothing ;)
> >>
> >> 1) I suppose you could override the default Java Serialization process
> for
> >> your Document class and handle the de/serialization of the CAS via
> >> the CASCompleteSerializer - that would basically be the special
> treatment.
> >>
> >> 2) I do not think that you can make JCas objects (like
> SentenceAnnotation)
> >> "survive" the serialization process because they are not serializable.
> >> If you manage to de/serialize the CAS using CASCompleteSerializer, then
> >> you can make use of the CAS addresses in each annotation. Your Sentence
> >> object can maintain a reference to the address of each
> SentenceAnnotation.
> >> When you want to access the SentenceAnnotation through your Sentence,
> >> you do so by resolving the address against the loaded JCas:
> >>
> >> (Store this address in your Sentence)
> >>  int address = sentenceAnnotation.getAddress()
> >>
> >> (Use it later after deserialization to fetch the SentenceAnnotation from
> >> the JCas)
> >>  (SentenceAnnotation) aJCas.getLowLevelCas().ll_getFSForRef(address)
> >>
> >> Btw. this is as fast as it gets - JCas wrappers use such code
> internally.
> >>
> >> I'd say what you plan to do should work but it verges on the border of
> >> black magic! But then again, I've done similar stuff ;)
> >>
> >> Cheers,
> >>
> >> -- Richard
> >>
> >> In your Document object, make the CAS a
> >>
> >> On 19.05.2014, at 12:04, Ofer Bronstein <[email protected]> wrote:
> >>
> >>> Hi Richard and all,
> >>>
> >>> Thank you for your answer. This is still only a partial solution, as:
> >>>
> >>> 1. The JCas is referenced from inside a Document object, and by your
> >>> suggestion, I must serialize both of them separately. For instance,
> write
> >>> it alternating: <Document, JCas, Document, JCas, ...>, or implement
> >>> Serializable.writeObject() and call
> >>> ObjectOutputStream.defaultWriteObject() for the other fields. However,
> I
> >> am
> >>> looking for a way to have the serializer of the document just go
> through
> >>> its default writeObject() implementation, and only when it encounters
> the
> >>> JCas field - then some special treatment would be triggered.
> >>>
> >>> 2. More importantly - my Sentence object (referenced by a Document
> >> object)
> >>> has a reference to a Sentence Annotation. This Annotation cannot be
> >>> serialized by the method you suggest, as it only takes a full CAS. Of
> >>> course I could implement here something that when deserializing, I
> would
> >>> iterate through the CAS and find each sentence's annotation and
> manually
> >>> put its reference in the Sentence object. But this is pretty
> complicated,
> >>> and would be a very lengthy process during deserialization. So I am
> >> looking
> >>> for a way for the SentenceAnnotation references to "survive" the
> >>> serialization\deserialization.
> >>>
> >>> Do you have any ideas?
> >>>
> >>> Thank you,
> >>> Ofer
> >>>
> >>>
> >>> On Mon, May 19, 2014 at 12:19 PM, Richard Eckart de Castilho <
> >> [email protected]
> >>>> wrote:
> >>>
> >>>> Hello Ofer,
> >>>>
> >>>> the CAS cannot be serialized immediately, but there is a helper class
> >>>> which is serializable.
> >>>>
> >>>> To write:
> >>>>
> >>>> ObjectOutputStream docOS = ...
> >>>> CASCompleteSerializer serializer =
> >>>> Serialization.serializeCASComplete(aJCas.getCasImpl());
> >>>> docOS.writeObject(serializer);
> >>>>
> >>>> To read:
> >>>>
> >>>> ObjectInputStream is = ...
> >>>> CASCompleteSerializer serializer = (CASCompleteSerializer)
> >> is.readObject();
> >>>> Serialization.deserializeCASComplete(serializer, (CASImpl) aCAS);
> >>>>
> >>>> However, there are newer and more efficient binary formats that you
> >> might
> >>>> want to use [1].
> >>>>
> >>>> If you want to dig into the topic or if you want to use a ready-made
> >> pair
> >>>> of
> >>>> readers/writers for the binary formats, you could consider taking a
> >> look at
> >>>> the BinaryCasReader/Writer in the DKPro Core [2,3] (non-ASF).
> >>>>
> >>>> Cheers,
> >>>>
> >>>> -- Richard
> >>>>
> >>>> [1]
> >>>>
> >>
> http://uima.apache.org/d/uimaj-2.6.0/tutorials_and_users_guides.html#ugr.tug.type_filtering.compressed_file
> >>>> [2]
> >>>>
> >>
> https://code.google.com/p/dkpro-core-asl/source/browse/de.tudarmstadt.ukp.dkpro.core-asl/trunk/de.tudarmstadt.ukp.dkpro.core.io.bincas-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/bincas/BinaryCasReader.java
> >>>> [3]
> >>>>
> >>
> https://code.google.com/p/dkpro-core-asl/source/browse/de.tudarmstadt.ukp.dkpro.core-asl/trunk/de.tudarmstadt.ukp.dkpro.core.io.bincas-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/io/bincas/BinaryCasWriter.java
> >>>>
> >>>> On 19.05.2014, at 11:03, Ofer Bronstein <[email protected]> wrote:
> >>>>
> >>>>> Hi Guys,
> >>>>>
> >>>>> I am an Israeli Master's Student, and have been happily working with
> >> UIMA
> >>>>> for the past two years.
> >>>>> I hope this is the right place for my question -
> >>>>>
> >>>>> I have a Document object I created, which has a JCas member with
> >>>>> annotations over a document.
> >>>>> I also have a Sentence object, with a member referencing its Sentence
> >>>>> Annotation in the corresponding JCas. Each Document object references
> >> all
> >>>>> of its Sentence objects.
> >>>>> I would like to dump each Document object as a file on disk, using
> the
> >>>>> default Java serialization. Later they would also be deserialized
> back
> >>>> into
> >>>>> the Java objects. I understand I would need some special treatment
> for
> >>>> the
> >>>>> JCases and the Sentence Annotations as they are not serializable
> (now I
> >>>> get
> >>>>> NotSerializableException). Hopefully the treatment could be as
> minimal
> >> as
> >>>>> possible.
> >>>>>
> >>>>> How do you suggest to do this, regarding serialization of JCas and
> >>>>> combining it with Java serialization?
> >>>>>
> >>>>> I am working on Windows, with Java 1.6 and UIMA 2.4.0. I am using the
> >>>> same
> >>>>> type system and the same 3 views for all JCases and annotations.
> >>>>>
> >>>>> Thank you,
> >>>>> Ofer Bronstein
>

Reply via email to