Re: CAS and CasView - concrete proposal
Adam Lally wrote: On 1/5/07, Marshall Schor [EMAIL PROTECTED] wrote: Solution 1: How about always passing in a JCasView object? For unaware components, this would be the view to use. For view aware components, this would be some view (perhaps picked in a similar way), but the user code would be expected to use this or change to other views and use those. You mean passing a JCasView to the user's process method. I'm not seeing how that really helps, and it would create more compatibility issues for existing code (essentially requiring JCas to be replaced by JCasView everywhere). I don't think I like this. I lost the train of thinking about what is passed to Annotators in the JCas case: is it a JCas object or a JCasView object - sounds like in your current thinking, it's a JCas object, right? And for backwards compatibility, sofa-unaware annotators, we would support constructors where the sofa would be set from the JCas current sofa, right? And for sofa-aware annotators, we would have additional constructors: - taking a an additional Sofa argument and/or - taking a JCasView argument and/or - both? I think it would be good to post additional sections to the wiki covering this, and what is proposed to be passed to component's process method. The new MyAnnotation( ) would be changed to take JCasViews as the object. Yes, that's option (b) from my earlier suggestions: (a) add a MyAnnotation constructor that takes a Sofa as an argument, and/or (b) add a MyAnnotation constructor that takes a JCasView instead of a JCas. The featureStructure.addToIndexes() could work if the Java cover object cas/jcas ref object pointed to the view. For JCas cover objects, we could keep this pointing to separate instances of the JCas _Type objects, but that might get a bit expensive - we'd need to have separate instanced of the _Type objects for each view (but that's how it's working today). For giant type systems, (1000 types), that's a lot of stuff to create - but maybe there's a lazy way to delay creation and only create those that are actually used. This would involve I think modifying the generator code to allow the generator to be absent, and created on first need. This seems like a lot of trouble to go through to support addToIndexes(). It seems better if we can get our users to switch to calling JCasView.addFsToIndexes(fs) instead. So I think I'd still like to see addToIndexes() be deprecated and only work for the current view. I think I agree with this thought, though I bet our users don't like it for the single sofa case. If we can make that case work, I'm OK with requiring sofa-aware components to say which view they want to index things in when calling add/removeTo/FromIndexes. --- Solution 2: General problem: new Annotation and add/removeTo/FromIndexes need additional argument: - the Sofa for Annotation and - the index-set (or view) for the add/removeTo/FromIndexes sofa/view-unaware components want to ignore this issue (for simplicity). How about a new framework method that lets users set the current view/sofa? This follows the use case that normal view-aware code works with one view at a time, in terms of adding/removing information. It has a bad aspect of being a bit indirect, and side-effect-ish. So I don't think I like it as much. But here's what it might look like: snip/ That's a lot like what we have today, where users interact with the JCas interface in order to interact with views. You've essentially just replaced getView() with a method like switchView(), that changes what view the JCas is pointing at. That seems problematic becuase it would also change the view for any other piece of code that might have had a handle to that JCas. I take it that's what you meant by side-effect-ish. Here's a specific proposal: 1) deprecate TOP.addToIndexes(). Document that it's here only to provide compatibility with older single-sofa code, and does not support multi-sofa code. Suggest that users migrate to JCasView.addFsToIndexes(fs) instead. (or fs.addToIndexes(JCasView) ? Note: this is currently an existing API, if you change JCasView to JCas, since right now we don't have a separate JCasView ) I agree with this for sofa-aware components. For sofa-unaware components, it seems like extra work to require specifying the view. 2) Add the constructors AnnotationBase(JCas, Sofa) and AnnotationBase(JCasView). These set the sofa pointer of the new annotation appropriately. Yes, these seem appropriate. 3) Deprecate the constructor AnnotationBase(JCas). It always sets the Sofa reference to the current view and does not support multi-sofa code. Users should migrate to one of the other constructors. I'm not sure we want sofa-unaware components to write as if they were aware of all the multi-view, multi-sofa machinery. If a large percentage (more than 50%) of the components are sofa-unaware, and most people climbing the learning
Re: CAS and CasView - concrete proposal
Single-sofa code could be made to work using the same current view idea already discussed. But multi-sofa code will have a problem. So I think we need to deprecate addToIndexes(). Not sure about this - because the current view mechanism would seem to make this work, even for multi-sofa. We could even put in code that checked if the item being indexed was a subtype of AnnotationBase, and if so, indexed it in the proper view (if the current view had a Sofa but it was the wrong one). Single Sofa code would work for addToIndexes() by always adding to the index of the initial view. I don't see any way that this method signature can know which other view to use in a multi-Sofa situation. Eddie
Re: CAS and CasView - concrete proposal
On 1/4/07, Marshall Schor [EMAIL PROTECTED] wrote: Adam Lally wrote: So I think we need to deprecate addToIndexes(). Not sure about this - because the current view mechanism would seem to make this work, even for multi-sofa. We could even put in code that checked if the item being indexed was a subtype of AnnotationBase, and if so, indexed it in the proper view (if the current view had a Sofa but it was the wrong one). Oooh, now I realize this was even more of a problem then I thought. I was thinking of code that currently does this: process(JCas aJCas) { JCas myView = aJCas.createView(foo); MyAnnotation annot = new MyAnnotation(myView); annot.addToIndexes(); } This indexes annot in myView. With the new refactoring the first line is now an error because the return type of createView is now JCasView. To eliminate the errors the user changes their code to: process(JCas aJCas) { JCasView myView = aJCas.createView(foo); MyAnnotation annot = new MyAnnotation(aJCas); annot.addToIndexes(); } The original problem I had was - how do we know what view to index the annotation in. I don't think we should just add it to the current view (which would be different than myView). That seems dangerous with no deprecation warning. But I realize there's a bigger problem. In line 2 when we create the annotation, what Sofa do we point it at?? It seems like we would need to do (a) add a MyAnnotation constructor that takes a Sofa as an argument, and/or (b) add a MyAnnotation constructor that takes a JCasView instead of a JCas. Both involve changes to the user's JCas-generated code, or require them to rerun JCasGen, and would require manual updates to any user-written constructor. So that's kind of ugly. -Adam
Re: CAS and CasView - concrete proposal
Adam Lally wrote: I put up a Wiki page giving the suggested breakdown of methods between the existing interfaces CommonCas, CAS, JCas and new interfaces CommonCasView, CasView, and JCasView. Please take a look: http://cwiki.apache.org/UIMA/casandcasviewinterfaceredesign.html. -Adam I would propose the following changes: - Leave createFeaturePath() and friends at the CAS. These methods require/return CAS-specific data structures and don't need to be accessible anywhere else. - On CasView, remove getJCasView() and getLowLevelCasView(). Those should be accessed from the JCas and LowLevelCas, respectively. - Similarly, on JCasView, remove getCasView() and getLowLevelCasView(). - On the JCas interface, can we remove some of the APIs and just make them available on the impl object? I'm thinking of things like putJfsFromCaddr(int, FeatureStructure) and getType(int). --Thilo
Re: CAS and CasView - concrete proposal
On 1/4/07, Thilo Goetz [EMAIL PROTECTED] wrote: I would propose the following changes: - Leave createFeaturePath() and friends at the CAS. These methods require/return CAS-specific data structures and don't need to be accessible anywhere else. Marshall had already moved these to CommonCas, so he must have thought they were usable from JCas? I'll let him comment on that, it's orthogonal to the CAS/view interface split I think. - On CasView, remove getJCasView() and getLowLevelCasView(). Those should be accessed from the JCas and LowLevelCas, respectively. - Similarly, on JCasView, remove getCasView() and getLowLevelCasView(). OK, I think I agree that might be cleaner. If we do this then we really have to add getView APIs to LowLevelCas, otherwise there would be no way at all to access a low-level interface to a view. - On the JCas interface, can we remove some of the APIs and just make them available on the impl object? I'm thinking of things like putJfsFromCaddr(int, FeatureStructure) and getType(int). I think these may be called from JCas cover-classes, in which case I think they need to be on the interface. Marshall? Another open issue is the createFS method and variants. I have left them off of the view API for now in deference to Thilo's no convenience methods suggestion, but I'm still a little unsure. Basically the situation now is if a user has a view handle and wants to create an FS they need to do view.getCAS().createFS(...); The upside is that it helps make it clear that FS are owned by the CAS, not the view. The downside is that it may annoy users to have to put in the getCAS call all the time. Also is it inconsistent that view.createAnnotation(...) *is* on the view API? This was done so that users can create annotations that refer to the Sofa for the view they're operating on. What do others think? -Adam
Re: CAS and CasView - concrete proposal
Adam Lally wrote: On 1/4/07, Thilo Goetz [EMAIL PROTECTED] wrote: ... Another open issue is the createFS method and variants. I have left them off of the view API for now in deference to Thilo's no convenience methods suggestion, but I'm still a little unsure. Basically the situation now is if a user has a view handle and wants to create an FS they need to do view.getCAS().createFS(...); Users will have a CAS around, anyway. Where else will they have gotten the view from? I'm not even sure we need a CasView.getCAS(). In what situation would it ever be unclear what CAS a view belongs to? Are you thinking of passing views instead of CASes to process() calls? The upside is that it helps make it clear that FS are owned by the CAS, not the view. The downside is that it may annoy users to have to put in the getCAS call all the time. Also is it inconsistent that view.createAnnotation(...) *is* on the view API? This was done so that users can create annotations that refer to the Sofa for the view they're operating on. An annotation is created with respect to a sofa, not a view. Why not do CAS.createAnnotation(Type, int, int, Sofa) or something? Then View.createAnnotation(Type, int, int) would be a convenience method. --Thilo
Re: CAS and CasView - concrete proposal
Adam Lally wrote: The process call would take a CAS. Inside the body of the process() method there would be no issue, but I'm thinking about other methods that the user has implemented that need access to the indexes and also need to create new FS. I'm sure there are tons of these. IMO having to carry around two object references instead of one would be a pain. Would we now require that such methods take both the CAS and the CasView as arguments? I'm not so happy with that. No, the CAS is sufficient. Then call getView() in the method (once, and cache the result). That's what I would do, anyway. An annotation is created with respect to a sofa, not a view. Why not do CAS.createAnnotation(Type, int, int, Sofa) or something? Then View.createAnnotation(Type, int, int) would be a convenience method. Yes, I had wanted to add CAS.createAnnotation(Type, int, int, Sofa) - thanks for reminding me I had left it off of the Wiki page. But still, I think there's even more need for the convenience function createAnnotation that there is for createFS. Without it we're left with view.getCAS().createAnnotation(type, begin, end, view.getSofa()). Even I would agree that a convenience method makes sense in this case. I just wanted to verify that it actually was a convenience method. --Thilo
Re: CAS and CasView - concrete proposal
On 1/4/07, Thilo Goetz [EMAIL PROTECTED] wrote: Adam Lally wrote: The process call would take a CAS. Inside the body of the process() method there would be no issue, but I'm thinking about other methods that the user has implemented that need access to the indexes and also need to create new FS. I'm sure there are tons of these. IMO having to carry around two object references instead of one would be a pain. Would we now require that such methods take both the CAS and the CasView as arguments? I'm not so happy with that. No, the CAS is sufficient. Then call getView() in the method (once, and cache the result). That's what I would do, anyway. For multi-sofa annotators, the method may not know which view to get. So if nothing else the view-name may need to be an argument. I'm thinking of some general purpose function that, say, looks at a Sofa and creates Person annotations. I think it would be normal to write this as annotatePersons(CasView view), so I could call it on different views if I wanted. If we were only creating annotations then just the CasView would be sufficient. But if the Person annotations needed to refer to non-annotation FS that also needed to be created, I'd be stuck unless: (a) the framework provides view.getCas() (b) the framework provides view.createFS() (c) I have to change my method signature to take a CAS as an argument, in addition to either the view or the view-name. -Adam
Re: CAS and CasView - concrete proposal
Adam Lally wrote: On 1/4/07, Thilo Goetz [EMAIL PROTECTED] wrote: I would propose the following changes: - Leave createFeaturePath() and friends at the CAS. These methods require/return CAS-specific data structures and don't need to be accessible anywhere else. Marshall had already moved these to CommonCas, so he must have thought they were usable from JCas? I'll let him comment on that, it's orthogonal to the CAS/view interface split I think. The reason they were put into the CommonCas is because I thought users of either the CAS or JCas interfaces might want to use createFeaturePath. These are used for filtered iterators, I think. Here's an example, from a parser that produces various kinds of nodes, done in JCas; it uses the older APIs, so has to get a CAS ref at the start: CAS cas = jcas.getCas(); // filter the iterator to only return top parse frames // Top parse frames have Sgc.POS.incomplete or Sgc.POS.Top as slot-filled type //Start by getting the constraint factory from the CAS. ConstraintFactory cf = cas.getConstraintFactory(); // Create empty path. FeaturePath path = cas.createFeaturePath(); // Add XsgParse slotName feature to path, creating one-element path. path.addFeature( ((XsgParse_Type) jcas.getType(XsgParse.typeIndexID)).casFeat_slotName); FSStringConstraint slotIsTop = cf.createStringConstraint(); FSStringConstraint slotIsIncomplete = cf.createStringConstraint(); slotIsTop.equals(Sgc.POS.top.toString()); slotIsIncomplete.equals(Sgc.POS.incomplete.toString()); FSMatchConstraint embeddedTop = cf.embedConstraint(path, slotIsTop); FSMatchConstraint embeddedInc = cf.embedConstraint(path, slotIsIncomplete); FSMatchConstraint topOrInc = cf.or(embeddedTop, embeddedInc); // Create a filtered iterator from some annotation iterator. return cas .createFilteredIterator( jcas.getJFSIndexRepository() .getAnnotationIndex(XsgParse.type) .iterator(), topOrInc); There is one kludgy part - getting the Feature value from the JCas structures. - On CasView, remove getJCasView() and getLowLevelCasView(). Those should be accessed from the JCas and LowLevelCas, respectively. - Similarly, on JCasView, remove getCasView() and getLowLevelCasView(). OK, I think I agree that might be cleaner. If we do this then we really have to add getView APIs to LowLevelCas, otherwise there would be no way at all to access a low-level interface to a view. - On the JCas interface, can we remove some of the APIs and just make them available on the impl object? I'm thinking of things like putJfsFromCaddr(int, FeatureStructure) and getType(int). I think these may be called from JCas cover-classes, in which case I think they need to be on the interface. Marshall? The putJfsFromCaddr is called from the JCas cover-classes. It could be in impl, though - we could change JCasGen and the migration tool. The getType(int) is used by users - should remain in the API. This method gives JCas users efficient access to the Type object corresponding to a JCas cover class: getType(MyJCasCoverClass.type); the Type object is needed by some APIs. Another open issue is the createFS method and variants. I have left them off of the view API for now in deference to Thilo's no convenience methods suggestion, but I'm still a little unsure. Basically the situation now is if a user has a view handle and wants to create an FS they need to do view.getCAS().createFS(...); The upside is that it helps make it clear that FS are owned by the CAS, not the view. The downside is that it may annoy users to have to put in the getCAS call all the time. Also is it inconsistent that view.createAnnotation(...) *is* on the view API? This was done so that users can create annotations that refer to the Sofa for the view they're operating on. What do others think? Users have already complained about this kind of thing. They have said they don't want to have to follow dereferencing chains to reach an object that finally has a method they need - they want the framework to do that for them. I don't think we have compelling enough reasons to require users to a) getViews from CASes, and also then b) dereference from the View back to CASes to do the real work of creating / accessing / updating Feature Structures. I think that the APIs for users should focus on what the users need to get done, with an eye toward economizing on the verbose-ness of the resulting code (sorry - I mean to say, the user's code should be very readable - and having dereferencing chains makes it less readable, I think). -Marshall
Re: CAS and CasView - concrete proposal
Adam Lally wrote: On 1/4/07, Thilo Goetz [EMAIL PROTECTED] wrote: Adam Lally wrote: The process call would take a CAS. Inside the body of the process() method there would be no issue, but I'm thinking about other methods that the user has implemented that need access to the indexes and also need to create new FS. I'm sure there are tons of these. IMO having to carry around two object references instead of one would be a pain. Would we now require that such methods take both the CAS and the CasView as arguments? I'm not so happy with that. No, the CAS is sufficient. Then call getView() in the method (once, and cache the result). That's what I would do, anyway. For multi-sofa annotators, the method may not know which view to get. So if nothing else the view-name may need to be an argument. I'm thinking of some general purpose function that, say, looks at a Sofa and creates Person annotations. I think it would be normal to write this as annotatePersons(CasView view), so I could call it on different views if I wanted. If we were only creating annotations then just the CasView would be sufficient. But if the Person annotations needed to refer to non-annotation FS that also needed to be created, I'd be stuck unless: (a) the framework provides view.getCas() (b) the framework provides view.createFS() (c) I have to change my method signature to take a CAS as an argument, in addition to either the view or the view-name. -Adam Users will have to change their method signatures anyway, as we're breaking multi-view code. However, I can see the case for (a).
Re: CAS and CasView - concrete proposal
This note is really from Marshall. He's having email trouble so I posted it on his behalf. On 1/4/07, Thilo Goetz [EMAIL PROTECTED] wrote: I would propose the following changes: - Leave createFeaturePath() and friends at the CAS. These methods require/return CAS-specific data structures and don't need to be accessible anywhere else. Marshall had already moved these to CommonCas, so he must have thought they were usable from JCas? I'll let him comment on that, it's orthogonal to the CAS/view interface split I think. The reason they were put into the CommonCas is because I thought users of either the CAS or JCas interfaces might want to use createFeaturePath. These are used for filtered iterators, I think. Here's an example, from a parser that produces various kinds of nodes, done in JCas; it uses the older APIs, so has to get a CAS ref at the start: CAS cas = jcas.getCas();// filter the iterator to only return top parse frames // Top parse frames have Sgc.POS.incomplete or Sgc.POS.Top as slot-filled type //Start by getting the constraint factory from the CAS. ConstraintFactory cf = cas.getConstraintFactory(); // Create empty path. FeaturePath path = cas.createFeaturePath(); // Add XsgParse slotName feature to path, creating one-element path. path.addFeature( ((XsgParse_Type) jcas.getType(XsgParse.typeIndexID)).casFeat_slotName); FSStringConstraint slotIsTop = cf.createStringConstraint(); FSStringConstraint slotIsIncomplete = cf.createStringConstraint(); slotIsTop.equals(Sgc.POS.top.toString()); slotIsIncomplete.equals(Sgc.POS.incomplete.toString()); FSMatchConstraint embeddedTop = cf.embedConstraint(path, slotIsTop); FSMatchConstraint embeddedInc = cf.embedConstraint(path, slotIsIncomplete); FSMatchConstraint topOrInc = cf.or(embeddedTop, embeddedInc); // Create a filtered iterator from some annotation iterator. return cas .createFilteredIterator( jcas.getJFSIndexRepository() .getAnnotationIndex(XsgParse.type) .iterator(), topOrInc); There is one kludgy part - getting the Feature value from the JCas structures. - On CasView, remove getJCasView() and getLowLevelCasView(). Those should be accessed from the JCas and LowLevelCas, respectively. - Similarly, on JCasView, remove getCasView() and getLowLevelCasView(). OK, I think I agree that might be cleaner. If we do this then we really have to add getView APIs to LowLevelCas, otherwise there would be no way at all to access a low-level interface to a view. - On the JCas interface, can we remove some of the APIs and just make them available on the impl object? I'm thinking of things like putJfsFromCaddr(int, FeatureStructure) and getType(int). I think these may be called from JCas cover-classes, in which case I think they need to be on the interface. Marshall? The putJfsFromCaddr is called from the JCas cover-classes. It could be in impl, though - we could change JCasGen and the migration tool. The getType(int) is used by users - should remain in the API. This method gives JCas users efficient access to the Type object corresponding to a JCas cover class: getType(MyJCasCoverClass.type); the Type object is needed by some APIs. Another open issue is the createFS method and variants. I have left them off of the view API for now in deference to Thilo's no convenience methods suggestion, but I'm still a little unsure. Basically the situation now is if a user has a view handle and wants to create an FS they need to do view.getCAS().createFS(...); The upside is that it helps make it clear that FS are owned by the CAS, not the view. The downside is that it may annoy users to have to put in the getCAS call all the time. Also is it inconsistent that view.createAnnotation(...) *is* on the view API? This was done so that users can create annotations that refer to the Sofa for the view they're operating on. What do others think? Users have already complained about this kind of thing. They have said they don't want to have to follow dereferencing chains to reach an object that finally has a method they need - they want the framework to do that for them. I don't think we have compelling enough reasons to require users to a) getViews from CASes, and also then b) dereference from the View back to CASes to do the real work of creating / accessing / updating Feature Structures. I think that the APIs for users should focus on what the users need to get done, with an eye toward economizing on the verbose-ness of the resulting code (sorry - I mean to say, the user's code should be very readable - and having dereferencing chains makes it less readable, I think). -Marshall
Re: CAS and CasView - concrete proposal
On 1/4/07, Marshall Schor [EMAIL PROTECTED] wrote: - On the JCas interface, can we remove some of the APIs and just make them available on the impl object? I'm thinking of things like putJfsFromCaddr(int, FeatureStructure) and getType(int). I think these may be called from JCas cover-classes, in which case I think they need to be on the interface. Marshall? The putJfsFromCaddr is called from the JCas cover-classes. It could be in impl, though - we could change JCasGen and the migration tool. Then the JCas cover class would have to do a typecast, and require that the JCas passed to its constructor was in fact a JCasImpl, which doesn't seem good (unless I'm missing something). I don't think we have compelling enough reasons to require users to a) getViews from CASes, and also then b) dereference from the View back to CASes to do the real work of creating / accessing / updating Feature Structures. I think that the APIs for users should focus on what the users need to get done, with an eye toward economizing on the verbose-ness of the resulting code (sorry - I mean to say, the user's code should be very readable - and having dereferencing chains makes it less readable, I think). I think users object more to the write-ability, which is partly how many keystrokes you need but also just how easy it is to learn and remember what it is you're supposed to write. Readability is subjective... We need to be clear on with what the concepts we're trying to communicate are. The code view.getCAS().createFS(...) may be considered more readable if it's an important concept that FeatureStructures are created on the CAS, not on a view. However, are we sure that's such an important concept? What if we say that a CasView is a particular window on a CAS, a way of looking at it. That seems kind of consistent with the word view. And it doesn't then seem so bad that you can make changes to the CAS through the view, such as adding FeatureStructures. -Adam
Re: CAS and CasView - concrete proposal
On 1/4/07, Marshall Schor [EMAIL PROTECTED] wrote: Adam Lally wrote: FYI I made updates to the Wiki page - see my comments on the page for details. -Adam I probably just missed it, but given a JCasView, how do get the corresponding JCas? Thanks for catching that omission. I have added JCasView.getJCas() to the Wiki. -Adam
Re: CAS and CasView - concrete proposal
There's another issue with JCas we haven't considered yet - the addToIndexes() method on JCasGen-erated classes. When this is called, it needs to know what index repository (what view) to index them in. Currently, this uses whichever view (meaning a JCas instance) was passed to the constructor when the object was created. With this refactoring, new objects would presumably be given a reference to the one-and-only JCas, never a JCasView. Single-sofa code could be made to work using the same current view idea already discussed. But multi-sofa code will have a problem. So I think we need to deprecate addToIndexes(). We can add a new method addToIndexes(JCasView) in its place, and/or require the use of JCasView.addToIndexes(FS) instead. -Adam
Re: CAS and CasView - concrete proposal
Adam Lally wrote: There's another issue with JCas we haven't considered yet - the addToIndexes() method on JCasGen-erated classes. When this is called, it needs to know what index repository (what view) to index them in. Same for removeFromIndexes() of course :-) Currently, this uses whichever view (meaning a JCas instance) was passed to the constructor when the object was created. With this refactoring, new objects would presumably be given a reference to the one-and-only JCas, never a JCasView. Single-sofa code could be made to work using the same current view idea already discussed. But multi-sofa code will have a problem. So I think we need to deprecate addToIndexes(). Not sure about this - because the current view mechanism would seem to make this work, even for multi-sofa. We could even put in code that checked if the item being indexed was a subtype of AnnotationBase, and if so, indexed it in the proper view (if the current view had a Sofa but it was the wrong one). To intentionally index a JCas cover object in another View, there is always the otherJCasView.addToIndexes(FS) method. So I'm not sure about the value of deprecating this. We can add a new method addToIndexes(JCasView) in its place, I like (prefer) this, but admit it seems redundant with the following and/or require the use of JCasView.addToIndexes(FS) instead. -Marshall
Re: CAS and CasView - concrete proposal
I put up a Wiki page giving the suggested breakdown of methods between the existing interfaces CommonCas, CAS, JCas and new interfaces CommonCasView, CasView, and JCasView. Please take a look: http://cwiki.apache.org/UIMA/casandcasviewinterfaceredesign.html. -Adam
Re: CAS and CasView - concrete proposal
On 12/30/06, Thilo Goetz [EMAIL PROTECTED] wrote: So your proposal is to leave things as they are, except that we call some of the things that we used to call a CAS a CasView. We're not going to touch how indexing works, at least conceptually. We could implement this proposal by simply making the CASImpl class implement the CasView interface and we would be more or less done. Is that a correct interpretation, or did I miss something? Pretty much... the basic objective was to split CAS and CasView so it would be apparent when you were looking at the whole CAS and when you were looking at a view. To clarify some specific differences between CAS and CasView: * CasView doesn't have getView(...) methods * CAS methods for sofa/index access are deprecated but forward to the current view (contrast with the situation today where they just don't work on the base CAS - returning null or throwing exceptions) -Adam
Re: CAS and CasView - concrete proposal
Adam Lally wrote: On 12/30/06, Thilo Goetz [EMAIL PROTECTED] wrote: So your proposal is to leave things as they are, except that we call some of the things that we used to call a CAS a CasView. We're not going to touch how indexing works, at least conceptually. We could implement this proposal by simply making the CASImpl class implement the CasView interface and we would be more or less done. Is that a correct interpretation, or did I miss something? Pretty much... the basic objective was to split CAS and CasView so it would be apparent when you were looking at the whole CAS and when you were looking at a view. To clarify some specific differences between CAS and CasView: * CasView doesn't have getView(...) methods * CAS methods for sofa/index access are deprecated but forward to the current view (contrast with the situation today where they just don't work on the base CAS - returning null or throwing exceptions) -Adam I wouldn't mind doing this as a first step, but I'm concerned about the future. If we need to support this approach going forward, I would prefer if we could answer the questions about the relation between the CAS and CasViews first: how are indexes in the CAS related to indexes in CasViews? If we're ok with maybe changing this again in the next release, I'm ok with starting like this. --Thilo
Re: CAS and CasView - concrete proposal
On 1/2/07, Thilo Goetz [EMAIL PROTECTED] wrote: I wouldn't mind doing this as a first step, but I'm concerned about the future. If we need to support this approach going forward, I would prefer if we could answer the questions about the relation between the CAS and CasViews first: how are indexes in the CAS related to indexes in CasViews? If we're ok with maybe changing this again in the next release, I'm ok with starting like this. This proposal only has indexed in CasViews, not indexes that belong directly to the CAS. (Unless you meant the deprecated index-access methods on CAS that use the current view.) I would also like to figure out if there should be such a thing as indexes that belong directly to the CAS (global indexes?), but it seemed like we were too far from a consensus on that to get anything done for 2.1. We can always add additional methods to CAS (i.e. getGlobalIndexRepository()) in a later version if we decide that's right. And we probably can't redefine any of the existing indexing methods on CAS without breaking a lot of code anyway. So it smells like starting with this proposal will not get in the way of future enhancements. -Adam
Re: CAS and CasView - concrete proposal
On 1/2/07, Marshall Schor [EMAIL PROTECTED] wrote: I think this proposal also has one set of index definitions, and each view gets its own private set of index-instances for these definitions. Correct. Will the methods not really associated with a CAS object (they are or could be static methods) still be on the CAS or CommonCas: createFilteredIterator, getConstraintFactory, createFeaturePath, createFeatureValuePath, and fs2listIterator I'm not sure they can be static, as they may depend on the type system. Some of them, anyway. I think these should still be on (Common)CAS, but might also be on CasView for convenience. This is seeming like a slippery slope, though, pretty soon everything is in two places. I suggest that the methods that belong in the CasView be left there (deprecated) to operate on the current view: get/set for Sofa things like DocumentText, SofaDataURI, etc., getSofa getIndexRepository getAnnotationIndex(type) get/set associated with DocumentAnnotation (I think there one of these per view - agree?) add/removeFsTo/FromIndexes Agree. For things like createView and getView(String or FS) - I'm ok with requiring these to be on the CAS Api only, but also wouldn't object if they were on the CasView API for convenience. I prefer leaving them off the CasView. The CAS has a getLowLevelCAS() method; the low level CAS includes both things for FSs and also for IndexRepositories. The index repository things should be looked at carefully to see if they should go with the view (with perhaps convenience functions working on the current view in the CAS Api). Good point... we may need a LowLevelCasView. (Currently the situation with LowLevelCas is the same as with CAS - an instance of LowLevelCas could either be referring to a view or to the base CAS.) And don't forget JCas, where we'll need a JCasView. When I get a minute I may try to compile a complete list of proposed changes, maybe on the Wiki. Finally :-) we have getSofaIterator... on the CAS Api (I'm not distinguishing between CAS and CommonCas APIs here). I suggest for 2.1 we lock the association between 1 view == 1 sofa. So we won't need getViewIterator, nor have to figure out how to name views separately from Sofas. Is that too restrictive? I agree, Sofas and Views are still 1-1 for the time being. And getSofaIterator would only be on CAS, not CasView. -Adam
Re: CAS and CasView - concrete proposal
Marshall Schor wrote: snip Will the methods not really associated with a CAS object (they are or could be static methods) still be on the CAS or CommonCas: createFilteredIterator, getConstraintFactory, createFeaturePath, createFeatureValuePath, and fs2listIterator I suggest that the methods that belong in the CasView be left there (deprecated) to operate on the current view: get/set for Sofa things like DocumentText, SofaDataURI, etc., getSofa getIndexRepository getAnnotationIndex(type) get/set associated with DocumentAnnotation (I think there one of these per view - agree?) add/removeFsTo/FromIndexes Adam suggested that the methods that belong in the CAS for creating FeatureStructures and get/setting their fields be made to work in the CasView API, for convenience; I agree with that. For things like createView and getView(String or FS) - I'm ok with requiring these to be on the CAS Api only, but also wouldn't object if they were on the CasView API for convenience. The time to add convenience methods is either a) never, or b) when the current APIs have been found inconvenient. Let's figure out where things belong first, and gather some experience with the setup. Then, if things are so inconvenient that we need to sacrifice some conceptual clarity, let's by all means introduce some carefully selected convenience functions. I vote for no convenience functions regarding CAS/CasView functionality in this release. Let's get people's heads wrapped around the concepts first before we start to muddle them with convenience functions. --Thilo
Re: CAS and CasView - concrete proposal
Adam Lally wrote: On 1/2/07, Marshall Schor [EMAIL PROTECTED] wrote: snip The CAS has a getLowLevelCAS() method; the low level CAS includes both things for FSs and also for IndexRepositories. The index repository things should be looked at carefully to see if they should go with the view (with perhaps convenience functions working on the current view in the CAS Api). Good point... we may need a LowLevelCasView. (Currently the situation with LowLevelCas is the same as with CAS - an instance of LowLevelCas could either be referring to a view or to the base CAS.) Unfortunately, the low-level CAS is missing the base CAS functionality. All the sofa/view stuff was implemented at the CAS level only. This is something that should be fixed. --Thilo
Re: CAS and CasView - concrete proposal
Thilo Goetz wrote: snipUnfortunately, the low-level CAS is missing the base CAS functionality. All the sofa/view stuff was implemented at the CAS level only. This is something that should be fixed. My understanding of the low-level interfaces is that they are there to support the no-Java-object-for-CAS-object scenario, representing CAS objects as ints. The ll (low level) APIs come in checking and non-checking versions too. The Sofa/View stuff would need to be implemented as low-level I think only if we conclude there is a need for a no-Java-object-for-CAS-object scenarios here. By this I mean things like getView(SofaFS) - we could imagine a version of this which instead of taking a Java cover object (JCas or CAS) for the Sofa instance in the CAS, it would take an int. I don't think this is really needed, though, because I have a hard time imagining the use case where this would make an observable difference. (but maybe I'm wrong here). Sofa/View stuff would need to be there to allow the ll APIs to work on the things they were designed for such as creating new Feature Structures, iterating over them, etc, but in a View. -Marshall
CAS and CasView - concrete proposal
Well, the concrete may not have quite set yet... but here goes: 1. Goals The following are confusing (or some might say, broken) (a) the interface CAS can be an interface to either the whole CAS or to a view. Methods like this are poor: CAS view = cas.getView(name); (b) the logic determining which CAS (a view or the whole CAS) gets passed to an annotator's process method is needlessly complicated. We would like to improve this in v2.1, so we have 3 weeks (starting now) to implement it. It's acceptable if what we do breaks multi-view annotators/applicatoins, but it cannot break single-view annotators/applications. We want whatever we do to be easier to document and explain to users than what we currently have. 2. Proposed Solution We don't plan to change the fundamental design of views at this point - there isn't time and it's too controversial. A view still consists of an index repository and a Sofa. (Yes, I know someday a view may not have a Sofa - but for now, it does.) A. New CasView interface We create a new interface CasView. All of the CAS.getView() methods will now return type CasView (instead of CAS). The CasView interface will contain all of the sofa-access methods and indexing-related methods that are on the CAS interface. A more controversial question is whether you can create FS from a CasView - i.e., does the method CasView.createFS(Type) exist? What about CasView.createAnnotation(Type, int begin, int end)? In some previous discussions we said no, FS creation is on the CAS only. This communicates to the user that, logically, FS creation is an operation on the CAS as a whole. However after thinking about this more I think that may be too inconvenient. If I have a handle to a view and want to create an FS I'd have to do: myView.getCAS().createFS(type); which is a little tedious. (and this assumes we have CasView.getCAS(), without which it is much worse). And what about annotations, which need a Sofa reference, so we would need something like: myView.getCAS().createAnnotation(type, myView.getSofa, begin, end) which is too ugly to consider. All in all I think it would be better to allow FS creation of the CasView interface as well as the CAS interface. I think we can explain this. A view is a window into a CAS - a particular way of looking at it - it should be a fully functional interface for interacting with the CAS from that viewpoint. And that would include creating new FS. B. Backwards Compatibility To meet the goal of being compatible with single-view annotators, we will use the following strategy: The idea is that a CAS has a current view. Any methods on the CAS that are view-oriented will apply to the current view. This includes but is not limited to: getSofa() getDocumentText() getIndexRepository() addFsToIndexes() createAnnotation(int begin, int end) //needs to know which Sofa to refer to The current view is determined by the framework and can be different for different annotators. For single-sofa annotators the current view is the view that the annotator should process, as determined by sofa mappings in the usual way. Note that this approach also allows single-sofa application code to work. We have a lot of code that does: AnalysisEngine ae = ... CAS cas = ae.newCAS(); cas.setDocumentText(someString); ae.process(cas); and it would be really nice if this continues to work. We could deprecate these APIs and encourage people to switch to the view-oriented APIs, which would be something like: AnalysisEngine ae = ... CAS cas = ae.newCAS(); CasView initialView = cas.getInitialView(); initialView.setDocumentText(someString); ae.process(cas);
Re: CAS and CasView - concrete proposal
More on hierarchies of implementation objects, and saving the user from writing dereferencing chains: Suppose we divide the CAS methods into those which would just not make sense on the CasView API, and others. In the same spirit of pleasing the users by avoiding what they could see as unnecessary dereferencing, would we make *all* the others also available via delegation to the CasView API? In other words, how many dereferencings would we want to shave off for users who wanted to get Type? Would they write: aCasView.getCas().getTypeSystem().getType(String) or aCasView.getTypeSystem().getType(String) or aCasView.getType(String)? And for indexes: aCasView.getIndexRepository().getIndex(name, aType); or aCasView.getIndex(name, aType); Users have often requested the shorter forms - they seem to often consider how the impl has organized these things into object hierarchies as an implementation detail they'd rather not be bothered with. JCas (for better or worse) has some of these shorter forms (for getting indexes, for instance). -Marshall Adam Lally wrote: Well, the concrete may not have quite set yet... but here goes: 1. Goals The following are confusing (or some might say, broken) (a) the interface CAS can be an interface to either the whole CAS or to a view. Methods like this are poor: CAS view = cas.getView(name); (b) the logic determining which CAS (a view or the whole CAS) gets passed to an annotator's process method is needlessly complicated. We would like to improve this in v2.1, so we have 3 weeks (starting now) to implement it. It's acceptable if what we do breaks multi-view annotators/applicatoins, but it cannot break single-view annotators/applications. We want whatever we do to be easier to document and explain to users than what we currently have. 2. Proposed Solution We don't plan to change the fundamental design of views at this point - there isn't time and it's too controversial. A view still consists of an index repository and a Sofa. (Yes, I know someday a view may not have a Sofa - but for now, it does.) A. New CasView interface We create a new interface CasView. All of the CAS.getView() methods will now return type CasView (instead of CAS). The CasView interface will contain all of the sofa-access methods and indexing-related methods that are on the CAS interface. A more controversial question is whether you can create FS from a CasView - i.e., does the method CasView.createFS(Type) exist? What about CasView.createAnnotation(Type, int begin, int end)? In some previous discussions we said no, FS creation is on the CAS only. This communicates to the user that, logically, FS creation is an operation on the CAS as a whole. However after thinking about this more I think that may be too inconvenient. If I have a handle to a view and want to create an FS I'd have to do: myView.getCAS().createFS(type); which is a little tedious. (and this assumes we have CasView.getCAS(), without which it is much worse). And what about annotations, which need a Sofa reference, so we would need something like: myView.getCAS().createAnnotation(type, myView.getSofa, begin, end) which is too ugly to consider. All in all I think it would be better to allow FS creation of the CasView interface as well as the CAS interface. I think we can explain this. A view is a window into a CAS - a particular way of looking at it - it should be a fully functional interface for interacting with the CAS from that viewpoint. And that would include creating new FS. B. Backwards Compatibility To meet the goal of being compatible with single-view annotators, we will use the following strategy: The idea is that a CAS has a current view. Any methods on the CAS that are view-oriented will apply to the current view. This includes but is not limited to: getSofa() getDocumentText() getIndexRepository() addFsToIndexes() createAnnotation(int begin, int end) //needs to know which Sofa to refer to The current view is determined by the framework and can be different for different annotators. For single-sofa annotators the current view is the view that the annotator should process, as determined by sofa mappings in the usual way. Note that this approach also allows single-sofa application code to work. We have a lot of code that does: AnalysisEngine ae = ... CAS cas = ae.newCAS(); cas.setDocumentText(someString); ae.process(cas); and it would be really nice if this continues to work. We could deprecate these APIs and encourage people to switch to the view-oriented APIs, which would be something like: AnalysisEngine ae = ... CAS cas = ae.newCAS(); CasView initialView = cas.getInitialView(); initialView.setDocumentText(someString); ae.process(cas);
Re: CAS and CasView - concrete proposal
On 12/29/06, Marshall Schor [EMAIL PROTECTED] wrote: snip/ It seems to me you will need a CasViewImpl class - this is for the use case where the user wants to, e.g., run two iterators together, one iterating over one view, while the other goes over another view. The actual objects that implement CAS views could be tiny - just a specification of which view it was, plus a ref to the shared CASImpl. Is this what you're thinking? Details... Yes, I think three would need to be a CasViewImpl. I don't have strong feelings about how to implement it. But unless there are reasons to change things, I think we may want to stick close to how views are currently implemented, which is that the CasViewImpl owns its index repository and also has a direct reference to the heap and other internal parts of the CAS. These references save having to make an extra method call to implement methods such as createFS. Also, I forgot to mention in the proposal - I think we'll need to have a JCasView as well. Some thought should be put into how this relates to your recent refactoring that produced CommonCas... we may need a CommonCasView. But I think we can work these details out if the general ideas in my proposal are acceptable. -Adam
Re: CAS and CasView - concrete proposal
So your proposal is to leave things as they are, except that we call some of the things that we used to call a CAS a CasView. We're not going to touch how indexing works, at least conceptually. We could implement this proposal by simply making the CASImpl class implement the CasView interface and we would be more or less done. Is that a correct interpretation, or did I miss something? --Thilo