Very good
On Wed, Aug 27, 2014 at 2:39 AM, [email protected] (Andy McMurry) < [email protected]> wrote: > Interesting thread in UIMA core about JSON Serialization CAS and > Descriptors. > > > Begin forwarded message: > > > From: Marshall Schor <[email protected]> > > Subject: Re: [jira] [Created] (UIMA-3969) Add JSON Serialization for > CASs and UIMA Descriptors > > Date: August 25, 2014 at 8:33:54 PM PDT > > To: [email protected] > > Reply-To: [email protected] > > > > > > On 8/25/2014 6:54 PM, Jens Grivolla wrote: > >> Is the JSON serialization documented somewhere? > > Yes, there's a chapter in the reference book. You can build that > > (uima-docbook-references), until it's released. > > > > There are also lots of Javadocs in the main implementing class: > > XmiCasSerializer. (It's in this class because it shares a lot of the > machinery > > with Xmi serialization). > > > >> > >> I saw that there appear to be quite a few alternative serializations. It > >> seems to include something like a typesystem definition, but only with a > >> list of feature names, not their types, if I understood the format > >> correctly (@featureRefs has a list of the features that are not of > >> primitive types, it seems). > > The @featureRefs is only those features which are "references" to other > feature > > structures. > > > > You're correct, in noticing that the feature "range" types are not > present. > > This is because the serialization is to JSON, which supports a native > > representation of things that are collections (JSON arrays) which could > be uima > > Arrays or Lists, and ranges that are boolean are representable by JSON > true and > > false values. There is no distinction that a number is a > byte/short/int/long, > > because those are all represented as a JSON "number". And so forth... > > > > The Json serialization for a CAS can optionally include parts of the type > > system: It can include what the supertypes are for serialized types (to > enable > > iterating over a type and all of its subtypes, like Cas iterators > normally do); > > it can also identify which slots which appear to have number values are > actually > > to be interpreted as references to other feature structures. Otherwise, > the > > serialized form might have a slot "foo" : 111 which is a number value, > and a > > slot "bar" : 112 which is a reference to another feature structure whose > ID is > > 112. This extra information (in @featureRefs) permits the user of the > JSON > > serialized form a way to distinguish these two case. > > > >> > >> It would be very useful if the serialization allowed one to easily pull > out > >> a partial CAS with just a subset of the views (by only including some > >> subtrees of the JSON structure), and merge views into it. > > Another optional part of the serialization is a list of views, together > with an > > array of numbers each one of which represents a serialized Feature > Structure > > that is indexed in that view. > >> This might be > >> complicated, as I understand that the views define annotation indices, > but > >> the same annotation can be indexed in several views, right? > > > > Feature Structures can be classified into "Annotations" and other types > (not a > > subtype of Annotation). > > > > Annotations are special - they have an implied reference to a particular > subject > > of analysis. So they are restricted to being indexed in the view that is > > associated with that subject-of-analysis. > > > > Other types (not subtypes of Annotation (or more precisely, > AnnotationBase)) do > > not have this restriction, and can be indexed in multiple views. > > > > See > > > http://uima.apache.org/d/uimaj-2.6.0/tutorials_and_users_guides.html#ugr.tug.aas.annotations_associated_sofa > . > > > > Let me know where the documentation might be improved :-) > > > > -Marshall > >> > >> -- Jens > >> > >> > >> > > > >
