Hi Jörn, Thank you for your comments; I hope you can expand a bit (see below).
On 7/23/2015 9:45 AM, Joern Kottmann wrote: > Well, I thought about something which can be done in 3, 4 or 5 lines of > code. > > To use a CAS, its first creating the TypeSystemDescriptor, creating an > empty CAS and then loading something into it. > Placing content in it is often done using an AE. If I want to reuse an > existing deserializer/serializer I always end up with an AE, > maybe there are some rare exceptions. > > In a bigger system there will be a couple of components dealing with CASes, > if there is a small change to the type system they all have to be updated, > even when they are not affected by the change, e.g. type addition or a > change to a type they don't use. I'd like to understand this better. Since the pipeline's final type system is created at pipeline-startup-time, from the "merge" of all the component's type systems, it seems to me that you would not need to update the type systems in other components not affected by the change? If the concern is the need to have a JCas cover class generated for the merged type system, version 3 is hoping to make that "automatic". > In our system we have many different > import pipelines, sometimes those pipelines have specific types which are > only used in an early stage, if a generic component has to deal with one of > those CASes the only good option is to merge all type systems together. Since UIMA pipelines do this type merge, I'm guessing you might be thinking about this outside of UIMA pipelines, such as a scenario where you have one step (using those many different import pipelines), and perhaps having those write out some CASs, and then wanting to read in those CASes in another step to be processed by your generic component, and therefore needing that 2nd step to have the merge of all the type systems together, to enable deserializing. Is this the scenario, or is there another use case you're thinking of? If this is the scenario, another option would be to have the serialized CASes stored along with a reference to their type system, and have some new deserialization capability be able to locate the referred-to type system along with the CAS to be read in. Would that "solve" this issue, or are there other aspects? > > The way we use UIMA is that we let it process our content with different > custom pipelines, and at the end of each pipeline the results are converted > into POJOs and those are written into a database, all code which follows > just uses the POJOs to process the data. My point is: If the CAS would be > in a better state we could just use it through out the entire application > instead of our CAS-like layer. In version 3, we're planning on storing the Feature Structures as just instances of their JCas Java Cover Objects, pretty close to POJOs. So maybe there's a good chance... -Marshall > > Jörn > > On Thu, Jul 23, 2015 at 2:55 PM, Richard Eckart de Castilho <[email protected]> > wrote: > >> On 23.07.2015, at 14:43, Joern Kottmann <[email protected]> wrote: >> >>> One thing which must have been overlooked when UIMA was built is that >>> people (like me) have to write code which wants to interact with the CAS >>> but can't be an AE. In UIMA the CAS (either in memory, or serialized) >>> is difficult to >>> be used without implementing an AE. >> I'm not sure why you feel like that. E.g. in WebAnno (an annotation editor >> that uses the CAS as its internal data model), create operate with the CAS >> basically without any AEs. All editing operations are done directly on the >> CAS which is loaded/saved directly using the UIMA API calls for binary >> serialization. >> >> Basically, we are using the same API that we would be using in an AE, but >> without >> the AE/pipelining stuff. It doesn't get any more difficult without the AE >> - in fact >> some things become easier without AEs, readers, and consumers. >> >> I'm sure you must have something similar in the CAS Editor plugin in >> Eclipse, no? >> >> -- Richard
