Re: CAS and CasView - concrete proposal

2007-01-06 Thread Marshall Schor

Adam Lally wrote:

On 1/5/07, Marshall Schor [EMAIL PROTECTED] wrote:

Solution 1:

How about always passing in a JCasView object?  For unaware components,
this would be the view to use.  For view aware components, this would be
some view (perhaps picked in a similar way), but the user code would be
expected to use this or change to other views and use those.



You mean passing a JCasView to the user's process method.  I'm not
seeing how that really helps, and it would create more compatibility
issues for existing code (essentially requiring JCas to be replaced by
JCasView everywhere).  I don't think I like this.


I lost the train of thinking about what is passed to Annotators in the 
JCas case:
is it a JCas object or a JCasView object - sounds like in your current 
thinking,

it's a JCas object, right?

And for backwards compatibility, sofa-unaware annotators, we would support
constructors where the sofa would be set from the JCas current sofa, 
right?


And for sofa-aware annotators, we would have additional constructors:
 - taking a an additional Sofa argument and/or
 - taking a JCasView argument and/or
 - both?

I think it would be good to post additional sections to the wiki 
covering this, and

what is proposed to be passed to component's process method.




The new MyAnnotation( ) would be changed to take JCasViews as the
object.



Yes, that's option (b) from my earlier suggestions:
(a) add a MyAnnotation constructor that takes a Sofa as an argument,
and/or (b) add a MyAnnotation constructor that takes a JCasView
instead of a JCas.



The featureStructure.addToIndexes() could work if the Java cover object
cas/jcas ref object pointed to the view.  For JCas cover objects, we 
could

keep this pointing to separate instances of the JCas _Type objects, but
that might
get a bit expensive - we'd need to have separate instanced of the _Type
objects
for each view (but that's how it's working today).  For giant type
systems, (1000 types),
that's a lot of stuff to create - but maybe there's a lazy way to
delay creation and
only create those that are actually used.  This would involve I think
modifying the
generator code to allow the generator to be absent, and created on first
need.



This seems like a lot of trouble to go through to support
addToIndexes().  It seems better if we can get our users to switch to
calling JCasView.addFsToIndexes(fs) instead.  So I think I'd still
like to see addToIndexes() be deprecated and only work for the current
view.


I think I agree with this thought, though I bet our users don't like it 
for the

single sofa case.  If we can make that case work, I'm OK with requiring
sofa-aware components to say which view they want to index things in when
calling add/removeTo/FromIndexes.





---
Solution 2:
General problem:

new Annotation and add/removeTo/FromIndexes need additional argument:
   - the Sofa for Annotation and
   - the index-set (or view) for the add/removeTo/FromIndexes

sofa/view-unaware components want to ignore this issue (for simplicity).

How about a new framework method that lets
users set the current view/sofa?  This follows the use case that normal
view-aware code works with one view at a time, in terms of 
adding/removing

information.  It has a bad aspect of being a bit indirect, and
side-effect-ish.
So I don't think I like it as much.  But here's what it might look like:
snip/


That's a lot like what we have today, where users interact with the
JCas interface in order to interact with views.  You've essentially
just replaced getView() with a method like switchView(), that changes
what view the JCas is pointing at.  That seems problematic becuase it
would also change the view for any other piece of code that might have
had a handle to that JCas.  I take it that's what you meant by
side-effect-ish.


Here's a specific proposal:

1) deprecate TOP.addToIndexes().  Document that it's here only to
provide compatibility with older single-sofa code, and does not
support multi-sofa code.  Suggest that users migrate to
JCasView.addFsToIndexes(fs) instead.


(or fs.addToIndexes(JCasView) ?  Note: this is currently an existing API,
if you change JCasView to JCas, since right now we don't have a separate
JCasView )
I agree with this for sofa-aware components.  For sofa-unaware components,
it seems like extra work to require specifying the view.


2) Add the constructors AnnotationBase(JCas, Sofa) and
AnnotationBase(JCasView).  These set the sofa pointer of the new
annotation appropriately.

Yes, these seem appropriate.


3) Deprecate the constructor AnnotationBase(JCas).  It always sets the
Sofa reference to the current view and does not support multi-sofa
code.  Users should migrate to one of the other constructors.

I'm not sure we want sofa-unaware components to write as if they were
aware of all the multi-view, multi-sofa machinery.  If a large percentage
(more than 50%) of the components are sofa-unaware, and most people
climbing the learning 

Re: CAS and CasView - concrete proposal

2007-01-05 Thread Eddie Epstein
 Single-sofa code could be made to work using the same current view
 idea already discussed.  But multi-sofa code will have a problem.

 So I think we need to deprecate addToIndexes().
 Not sure about this - because the current view mechanism would
 seem to make this work, even for multi-sofa.   We could even put in
 code that checked if the item being indexed was a subtype of
 AnnotationBase,
 and if so, indexed it in the proper view (if the current view had a Sofa
 but it
 was the wrong one).

Single Sofa code would work for addToIndexes() by always adding to the
index of the initial view. I don't see any way that this method signature
can know which other view to use in a multi-Sofa situation.

Eddie


Re: CAS and CasView - concrete proposal

2007-01-05 Thread Adam Lally

On 1/4/07, Marshall Schor [EMAIL PROTECTED] wrote:

Adam Lally wrote:
 So I think we need to deprecate addToIndexes().
Not sure about this - because the current view mechanism would
seem to make this work, even for multi-sofa.   We could even put in
code that checked if the item being indexed was a subtype of AnnotationBase,
and if so, indexed it in the proper view (if the current view had a Sofa
but it
was the wrong one).



Oooh, now I realize this was even more of a problem then I thought.  I
was thinking of code that currently does this:

process(JCas aJCas) {
 JCas myView = aJCas.createView(foo);
 MyAnnotation annot = new MyAnnotation(myView);
 annot.addToIndexes();
}

This indexes annot in myView.

With the new refactoring the first line is now an error because the
return type of createView is now JCasView. To eliminate the errors the
user changes their code to:

process(JCas aJCas) {
 JCasView myView = aJCas.createView(foo);
 MyAnnotation annot = new MyAnnotation(aJCas);
 annot.addToIndexes();
}

The original problem I had was - how do we know what view to index the
annotation in.  I don't think we should just add it to the current
view (which would be different than myView). That seems dangerous with
no deprecation warning.

But I realize there's a bigger problem.  In line 2 when we create the
annotation, what Sofa do we point it at??

It seems like we would need to do (a) add a MyAnnotation constructor
that takes a Sofa as an argument, and/or (b) add a MyAnnotation
constructor that takes a JCasView instead of a JCas.

Both involve changes to the user's JCas-generated code, or require
them to rerun JCasGen, and would require manual updates to any
user-written constructor.  So that's kind of ugly.

-Adam


Re: CAS and CasView - concrete proposal

2007-01-04 Thread Thilo Goetz

Adam Lally wrote:

I put up a Wiki page giving the suggested breakdown of methods between
the existing interfaces CommonCas, CAS, JCas and new interfaces
CommonCasView, CasView, and JCasView.  Please take a look:
http://cwiki.apache.org/UIMA/casandcasviewinterfaceredesign.html.

-Adam


I would propose the following changes:

- Leave createFeaturePath() and friends at the CAS.  These methods 
require/return CAS-specific data structures and don't need to be 
accessible anywhere else.


- On CasView, remove getJCasView() and getLowLevelCasView().  Those 
should be accessed from the JCas and LowLevelCas, respectively.


- Similarly, on JCasView, remove getCasView() and getLowLevelCasView().

- On the JCas interface, can we remove some of the APIs and just make 
them available on the impl object?  I'm thinking of things like 
putJfsFromCaddr(int, FeatureStructure) and getType(int).


--Thilo


Re: CAS and CasView - concrete proposal

2007-01-04 Thread Adam Lally

On 1/4/07, Thilo Goetz [EMAIL PROTECTED] wrote:

I would propose the following changes:

- Leave createFeaturePath() and friends at the CAS.  These methods
require/return CAS-specific data structures and don't need to be
accessible anywhere else.



Marshall had already moved these to CommonCas, so he must have thought
they were usable from JCas?  I'll let him comment on that, it's
orthogonal to the CAS/view interface split I think.



- On CasView, remove getJCasView() and getLowLevelCasView().  Those
should be accessed from the JCas and LowLevelCas, respectively.

- Similarly, on JCasView, remove getCasView() and getLowLevelCasView().



OK, I think I agree that might be cleaner.  If we do this then we
really have to add getView APIs to LowLevelCas, otherwise there would
be no way at all to access a low-level interface to a view.



- On the JCas interface, can we remove some of the APIs and just make
them available on the impl object?  I'm thinking of things like
putJfsFromCaddr(int, FeatureStructure) and getType(int).



I think these may be called from JCas cover-classes, in which case I
think they need to be on the interface.  Marshall?



Another open issue is the createFS method and variants.  I have left
them off of the view API for now in deference to Thilo's no
convenience methods suggestion, but I'm still a little unsure.
Basically the situation now is if a user has a view handle and wants
to create an FS they need to do view.getCAS().createFS(...);

The upside is that it helps make it clear that FS are owned by the
CAS, not the view.  The downside is that it may annoy users to have to
put in the getCAS call all the time.  Also is it inconsistent that
view.createAnnotation(...) *is* on the view API?  This was done so
that users can create annotations that refer to the Sofa for the view
they're operating on.

What do others think?

-Adam


Re: CAS and CasView - concrete proposal

2007-01-04 Thread Thilo Goetz

Adam Lally wrote:

On 1/4/07, Thilo Goetz [EMAIL PROTECTED] wrote:

...

Another open issue is the createFS method and variants.  I have left
them off of the view API for now in deference to Thilo's no
convenience methods suggestion, but I'm still a little unsure.
Basically the situation now is if a user has a view handle and wants
to create an FS they need to do view.getCAS().createFS(...);


Users will have a CAS around, anyway.  Where else will they have gotten 
the view from?  I'm not even sure we need a CasView.getCAS().  In what 
situation would it ever be unclear what CAS a view belongs to?  Are you 
thinking of passing views instead of CASes to process() calls?




The upside is that it helps make it clear that FS are owned by the
CAS, not the view.  The downside is that it may annoy users to have to
put in the getCAS call all the time.  Also is it inconsistent that
view.createAnnotation(...) *is* on the view API?  This was done so
that users can create annotations that refer to the Sofa for the view
they're operating on.


An annotation is created with respect to a sofa, not a view.  Why not do 
CAS.createAnnotation(Type, int, int, Sofa) or something?  Then 
View.createAnnotation(Type, int, int) would be a convenience method.


--Thilo


Re: CAS and CasView - concrete proposal

2007-01-04 Thread Thilo Goetz

Adam Lally wrote:

The process call would take a CAS.  Inside the body of the process()
method there would be no issue, but I'm thinking about other methods
that the user has implemented that need access to the indexes and also
need to create new FS.  I'm sure there are tons of these.  IMO having
to carry around two object references instead of one would be a pain.
Would we now require that such methods take both the CAS and the
CasView as arguments?  I'm not so happy with that.


No, the CAS is sufficient.  Then call getView() in the method (once, and 
cache the result).  That's what I would do, anyway.



An annotation is created with respect to a sofa, not a view.  Why not do
CAS.createAnnotation(Type, int, int, Sofa) or something?  Then
View.createAnnotation(Type, int, int) would be a convenience method.



Yes, I had wanted to add CAS.createAnnotation(Type, int, int, Sofa) -
thanks for reminding me I had left it off of the Wiki page.  But
still, I think there's even more need for the convenience function
createAnnotation that there is for createFS.  Without it we're left
with
view.getCAS().createAnnotation(type, begin, end, view.getSofa()).


Even I would agree that a convenience method makes sense in this case. 
I just wanted to verify that it actually was a convenience method.


--Thilo


Re: CAS and CasView - concrete proposal

2007-01-04 Thread Adam Lally

On 1/4/07, Thilo Goetz [EMAIL PROTECTED] wrote:

Adam Lally wrote:
 The process call would take a CAS.  Inside the body of the process()
 method there would be no issue, but I'm thinking about other methods
 that the user has implemented that need access to the indexes and also
 need to create new FS.  I'm sure there are tons of these.  IMO having
 to carry around two object references instead of one would be a pain.
 Would we now require that such methods take both the CAS and the
 CasView as arguments?  I'm not so happy with that.

No, the CAS is sufficient.  Then call getView() in the method (once, and
cache the result).  That's what I would do, anyway.



For multi-sofa annotators, the method may not know which view to get.
So if nothing else the view-name may need to be an argument.  I'm
thinking of some general purpose function  that, say, looks at a Sofa
and creates Person annotations.  I think it would be normal to write
this as annotatePersons(CasView view), so I could call it on different
views if I wanted.  If we were only creating annotations then just the
CasView would be sufficient.  But if the Person annotations needed to
refer to non-annotation FS that also needed to be created, I'd be
stuck unless:
(a) the framework provides view.getCas()
(b) the framework provides view.createFS()
(c) I have to change my method signature to take a CAS as an argument,
in addition to either the view or the view-name.


-Adam


Re: CAS and CasView - concrete proposal

2007-01-04 Thread Marshall Schor

Adam Lally wrote:

On 1/4/07, Thilo Goetz [EMAIL PROTECTED] wrote:

I would propose the following changes:

- Leave createFeaturePath() and friends at the CAS.  These methods
require/return CAS-specific data structures and don't need to be
accessible anywhere else.



Marshall had already moved these to CommonCas, so he must have thought
they were usable from JCas?  I'll let him comment on that, it's
orthogonal to the CAS/view interface split I think.


The reason they were put into the CommonCas is because I thought users of
either the CAS or JCas interfaces might want to use createFeaturePath.  
These are
used for filtered iterators, I think.  Here's an example, from a parser 
that produces
various kinds of nodes, done in JCas; it uses the older APIs, so has 
to get a

CAS ref at the start:

  CAS cas = jcas.getCas(); 
   // filter the iterator to only return top parse frames
   // Top parse frames have Sgc.POS.incomplete or Sgc.POS.Top as 
slot-filled type


   //Start by getting the constraint factory from the CAS.
   ConstraintFactory cf = cas.getConstraintFactory();
   // Create empty path.
   FeaturePath path = cas.createFeaturePath();

   // Add XsgParse slotName feature to path, creating one-element path.
   path.addFeature(
   ((XsgParse_Type) 
jcas.getType(XsgParse.typeIndexID)).casFeat_slotName);


   FSStringConstraint slotIsTop = cf.createStringConstraint();
   FSStringConstraint slotIsIncomplete = cf.createStringConstraint();

   slotIsTop.equals(Sgc.POS.top.toString());
   slotIsIncomplete.equals(Sgc.POS.incomplete.toString());

   FSMatchConstraint embeddedTop = cf.embedConstraint(path, slotIsTop);
   FSMatchConstraint embeddedInc = cf.embedConstraint(path, 
slotIsIncomplete);


   FSMatchConstraint topOrInc = cf.or(embeddedTop, embeddedInc);

   // Create a filtered iterator from some annotation iterator.
   return cas
   .createFilteredIterator(
   jcas.getJFSIndexRepository()
   .getAnnotationIndex(XsgParse.type)
   .iterator(),
   topOrInc);

There is one kludgy part - getting the Feature value from the JCas 
structures.






- On CasView, remove getJCasView() and getLowLevelCasView().  Those
should be accessed from the JCas and LowLevelCas, respectively.

- Similarly, on JCasView, remove getCasView() and getLowLevelCasView().



OK, I think I agree that might be cleaner.  If we do this then we
really have to add getView APIs to LowLevelCas, otherwise there would
be no way at all to access a low-level interface to a view.



- On the JCas interface, can we remove some of the APIs and just make
them available on the impl object?  I'm thinking of things like
putJfsFromCaddr(int, FeatureStructure) and getType(int).



I think these may be called from JCas cover-classes, in which case I
think they need to be on the interface.  Marshall?


The putJfsFromCaddr is called from the JCas cover-classes.  It could
be in impl, though - we could change JCasGen and the migration tool.

The getType(int) is used by users - should remain in the API.  This method
gives JCas users efficient access to the Type object corresponding to a
JCas cover class:  getType(MyJCasCoverClass.type);
the Type object is needed by some APIs. 




Another open issue is the createFS method and variants.  I have left
them off of the view API for now in deference to Thilo's no
convenience methods suggestion, but I'm still a little unsure.
Basically the situation now is if a user has a view handle and wants
to create an FS they need to do view.getCAS().createFS(...);

The upside is that it helps make it clear that FS are owned by the
CAS, not the view.  The downside is that it may annoy users to have to
put in the getCAS call all the time.  Also is it inconsistent that
view.createAnnotation(...) *is* on the view API?  This was done so
that users can create annotations that refer to the Sofa for the view
they're operating on.

What do others think?


Users have already complained about this kind of thing.  They have said
they don't want to have to follow dereferencing chains to reach an 
object that
finally has a method they need - they want the framework to do that for 
them.


I don't think we have compelling enough reasons to require users to
a) getViews from CASes, and also then b) dereference from the View back 
to CASes to do

the real work of creating / accessing / updating Feature Structures.

I think that the APIs for users should focus on what the users need to 
get done, with an eye toward

economizing on the verbose-ness of the resulting code (sorry - I mean to
say, the user's code should be very readable - and having 
dereferencing chains makes

it less readable, I think).

-Marshall



Re: CAS and CasView - concrete proposal

2007-01-04 Thread Thilo Goetz

Adam Lally wrote:

On 1/4/07, Thilo Goetz [EMAIL PROTECTED] wrote:

Adam Lally wrote:
 The process call would take a CAS.  Inside the body of the process()
 method there would be no issue, but I'm thinking about other methods
 that the user has implemented that need access to the indexes and also
 need to create new FS.  I'm sure there are tons of these.  IMO having
 to carry around two object references instead of one would be a pain.
 Would we now require that such methods take both the CAS and the
 CasView as arguments?  I'm not so happy with that.

No, the CAS is sufficient.  Then call getView() in the method (once, and
cache the result).  That's what I would do, anyway.



For multi-sofa annotators, the method may not know which view to get.
So if nothing else the view-name may need to be an argument.  I'm
thinking of some general purpose function  that, say, looks at a Sofa
and creates Person annotations.  I think it would be normal to write
this as annotatePersons(CasView view), so I could call it on different
views if I wanted.  If we were only creating annotations then just the
CasView would be sufficient.  But if the Person annotations needed to
refer to non-annotation FS that also needed to be created, I'd be
stuck unless:
(a) the framework provides view.getCas()
(b) the framework provides view.createFS()
(c) I have to change my method signature to take a CAS as an argument,
in addition to either the view or the view-name.


-Adam


Users will have to change their method signatures anyway, as we're 
breaking multi-view code.  However, I can see the case for (a).


Re: CAS and CasView - concrete proposal

2007-01-04 Thread Adam Lally

This note is really from Marshall.  He's having email trouble so I
posted it on his behalf.


On 1/4/07, Thilo Goetz [EMAIL PROTECTED] wrote:

I would propose the following changes:

- Leave createFeaturePath() and friends at the CAS.  These methods
require/return CAS-specific data structures and don't need to be
accessible anywhere else.



Marshall had already moved these to CommonCas, so he must have thought
they were usable from JCas?  I'll let him comment on that, it's
orthogonal to the CAS/view interface split I think.


The reason they were put into the CommonCas is because I thought users of
either the CAS or JCas interfaces might want to use createFeaturePath.
These are
used for filtered iterators, I think.  Here's an example, from a
parser that produces
various kinds of nodes, done in JCas; it uses the older APIs, so has to get a
CAS ref at the start:

 CAS cas = jcas.getCas();// filter the iterator to
only return top parse frames
  // Top parse frames have Sgc.POS.incomplete or Sgc.POS.Top as
slot-filled type

  //Start by getting the constraint factory from the CAS.
  ConstraintFactory cf = cas.getConstraintFactory();
  // Create empty path.
  FeaturePath path = cas.createFeaturePath();

  // Add XsgParse slotName feature to path, creating one-element path.
  path.addFeature(
  ((XsgParse_Type)
jcas.getType(XsgParse.typeIndexID)).casFeat_slotName);

  FSStringConstraint slotIsTop = cf.createStringConstraint();
  FSStringConstraint slotIsIncomplete = cf.createStringConstraint();

  slotIsTop.equals(Sgc.POS.top.toString());
  slotIsIncomplete.equals(Sgc.POS.incomplete.toString());

  FSMatchConstraint embeddedTop = cf.embedConstraint(path, slotIsTop);
  FSMatchConstraint embeddedInc = cf.embedConstraint(path,
slotIsIncomplete);

  FSMatchConstraint topOrInc = cf.or(embeddedTop, embeddedInc);

  // Create a filtered iterator from some annotation iterator.
  return cas
  .createFilteredIterator(
  jcas.getJFSIndexRepository()
  .getAnnotationIndex(XsgParse.type)
  .iterator(),
  topOrInc);

There is one kludgy part - getting the Feature value from the JCas structures.





- On CasView, remove getJCasView() and getLowLevelCasView().  Those
should be accessed from the JCas and LowLevelCas, respectively.

- Similarly, on JCasView, remove getCasView() and getLowLevelCasView().



OK, I think I agree that might be cleaner.  If we do this then we
really have to add getView APIs to LowLevelCas, otherwise there would
be no way at all to access a low-level interface to a view.



- On the JCas interface, can we remove some of the APIs and just make
them available on the impl object?  I'm thinking of things like
putJfsFromCaddr(int, FeatureStructure) and getType(int).



I think these may be called from JCas cover-classes, in which case I
think they need to be on the interface.  Marshall?


The putJfsFromCaddr is called from the JCas cover-classes.  It could
be in impl, though - we could change JCasGen and the migration tool.

The getType(int) is used by users - should remain in the API.  This method
gives JCas users efficient access to the Type object corresponding to a
JCas cover class:  getType(MyJCasCoverClass.type);
the Type object is needed by some APIs.




Another open issue is the createFS method and variants.  I have left
them off of the view API for now in deference to Thilo's no
convenience methods suggestion, but I'm still a little unsure.
Basically the situation now is if a user has a view handle and wants
to create an FS they need to do view.getCAS().createFS(...);

The upside is that it helps make it clear that FS are owned by the
CAS, not the view.  The downside is that it may annoy users to have to
put in the getCAS call all the time.  Also is it inconsistent that
view.createAnnotation(...) *is* on the view API?  This was done so
that users can create annotations that refer to the Sofa for the view
they're operating on.

What do others think?


Users have already complained about this kind of thing.  They have said
they don't want to have to follow dereferencing chains to reach an object that
finally has a method they need - they want the framework to do that for them.

I don't think we have compelling enough reasons to require users to
a) getViews from CASes, and also then b) dereference from the View
back to CASes to do
the real work of creating / accessing / updating Feature Structures.

I think that the APIs for users should focus on what the users need to
get done, with an eye toward
economizing on the verbose-ness of the resulting code (sorry - I mean to
say, the user's code should be very readable - and having
dereferencing chains makes
it less readable, I think).

-Marshall


Re: CAS and CasView - concrete proposal

2007-01-04 Thread Adam Lally

On 1/4/07, Marshall Schor [EMAIL PROTECTED] wrote:

 - On the JCas interface, can we remove some of the APIs and just make
 them available on the impl object?  I'm thinking of things like
 putJfsFromCaddr(int, FeatureStructure) and getType(int).


 I think these may be called from JCas cover-classes, in which case I
 think they need to be on the interface.  Marshall?

The putJfsFromCaddr is called from the JCas cover-classes.  It could
be in impl, though - we could change JCasGen and the migration tool.



Then the JCas cover class would have to do a typecast, and require
that the JCas passed to its constructor was in fact a JCasImpl, which
doesn't seem good (unless I'm missing something).


I don't think we have compelling enough reasons to require users to
a) getViews from CASes, and also then b) dereference from the View back
to CASes to do
the real work of creating / accessing / updating Feature Structures.

I think that the APIs for users should focus on what the users need to
get done, with an eye toward
economizing on the verbose-ness of the resulting code (sorry - I mean to
say, the user's code should be very readable - and having
dereferencing chains makes
it less readable, I think).



I think users object more to the write-ability, which is partly how
many keystrokes you need but also just how easy it is to learn and
remember what it is you're supposed to write.

Readability is subjective... We need to be clear on with what the
concepts we're trying to communicate are.  The code
view.getCAS().createFS(...) may be considered more readable if it's an
important concept that FeatureStructures are created on the CAS, not
on a view.

However, are we sure that's such an important concept?  What if we say
that a CasView is a particular window on a CAS, a way of looking at
it.  That seems kind of consistent with the word view.  And it
doesn't then seem so bad that you can make changes to the CAS through
the view, such as adding FeatureStructures.

-Adam


Re: CAS and CasView - concrete proposal

2007-01-04 Thread Adam Lally

On 1/4/07, Marshall Schor [EMAIL PROTECTED] wrote:

Adam Lally wrote:
 FYI I made updates to the Wiki page - see my comments on the page for
 details.

 -Adam


I probably just missed it, but given a JCasView, how do get the
corresponding JCas?



Thanks for catching that omission. I have added JCasView.getJCas() to the Wiki.

-Adam


Re: CAS and CasView - concrete proposal

2007-01-04 Thread Adam Lally

There's another issue with JCas we haven't considered yet - the
addToIndexes() method on JCasGen-erated classes.  When this is called,
it needs to know what index repository (what view) to index them in.

Currently, this uses whichever view (meaning a JCas instance) was
passed to the constructor when the object was created.  With this
refactoring, new objects would presumably be given a reference to the
one-and-only JCas, never a JCasView.

Single-sofa code could be made to work using the same current view
idea already discussed.  But multi-sofa code will have a problem.

So I think we need to deprecate addToIndexes().  We can add a new
method addToIndexes(JCasView) in its place, and/or require the use of
JCasView.addToIndexes(FS) instead.

-Adam


Re: CAS and CasView - concrete proposal

2007-01-04 Thread Marshall Schor

Adam Lally wrote:

There's another issue with JCas we haven't considered yet - the
addToIndexes() method on JCasGen-erated classes.  When this is called,
it needs to know what index repository (what view) to index them in.

Same for removeFromIndexes() of course :-)


Currently, this uses whichever view (meaning a JCas instance) was
passed to the constructor when the object was created.  With this
refactoring, new objects would presumably be given a reference to the
one-and-only JCas, never a JCasView.

Single-sofa code could be made to work using the same current view
idea already discussed.  But multi-sofa code will have a problem.

So I think we need to deprecate addToIndexes().  

Not sure about this - because the current view mechanism would
seem to make this work, even for multi-sofa.   We could even put in
code that checked if the item being indexed was a subtype of AnnotationBase,
and if so, indexed it in the proper view (if the current view had a Sofa 
but it

was the wrong one).

To intentionally index a JCas cover object in another View, there is 
always the

otherJCasView.addToIndexes(FS) method.

So I'm not sure about the value of deprecating this. 

We can add a new
method addToIndexes(JCasView) in its place, 

I like (prefer) this, but admit it seems redundant with the following

and/or require the use of
JCasView.addToIndexes(FS) instead.


-Marshall


Re: CAS and CasView - concrete proposal

2007-01-03 Thread Adam Lally

I put up a Wiki page giving the suggested breakdown of methods between
the existing interfaces CommonCas, CAS, JCas and new interfaces
CommonCasView, CasView, and JCasView.  Please take a look:
http://cwiki.apache.org/UIMA/casandcasviewinterfaceredesign.html.

-Adam


Re: CAS and CasView - concrete proposal

2007-01-02 Thread Adam Lally

On 12/30/06, Thilo Goetz [EMAIL PROTECTED] wrote:

So your proposal is to leave things as they are, except that we call
some of the things that we used to call a CAS a CasView.  We're not
going to touch how indexing works, at least conceptually.  We could
implement this proposal by simply making the CASImpl class implement the
CasView interface and we would be more or less done.

Is that a correct interpretation, or did I miss something?



Pretty much... the basic objective was to split CAS and CasView so it
would be apparent when you were looking at the whole CAS and when
you were looking at a view.

To clarify some specific differences between CAS and CasView:

* CasView doesn't have getView(...) methods
* CAS methods for sofa/index access are deprecated but forward to the
current view (contrast with the situation today where they just
don't work on the base CAS - returning null or throwing exceptions)

-Adam


Re: CAS and CasView - concrete proposal

2007-01-02 Thread Thilo Goetz

Adam Lally wrote:

On 12/30/06, Thilo Goetz [EMAIL PROTECTED] wrote:

So your proposal is to leave things as they are, except that we call
some of the things that we used to call a CAS a CasView.  We're not
going to touch how indexing works, at least conceptually.  We could
implement this proposal by simply making the CASImpl class implement the
CasView interface and we would be more or less done.

Is that a correct interpretation, or did I miss something?



Pretty much... the basic objective was to split CAS and CasView so it
would be apparent when you were looking at the whole CAS and when
you were looking at a view.

To clarify some specific differences between CAS and CasView:

* CasView doesn't have getView(...) methods
* CAS methods for sofa/index access are deprecated but forward to the
current view (contrast with the situation today where they just
don't work on the base CAS - returning null or throwing exceptions)

-Adam


I wouldn't mind doing this as a first step, but I'm concerned about the 
future.  If we need to support this approach going forward, I would 
prefer if we could answer the questions about the relation between the 
CAS and CasViews first: how are indexes in the CAS related to indexes in 
 CasViews?  If we're ok with maybe changing this again in the next 
release, I'm ok with starting like this.


--Thilo


Re: CAS and CasView - concrete proposal

2007-01-02 Thread Adam Lally

On 1/2/07, Thilo Goetz [EMAIL PROTECTED] wrote:

I wouldn't mind doing this as a first step, but I'm concerned about the
future.  If we need to support this approach going forward, I would
prefer if we could answer the questions about the relation between the
CAS and CasViews first: how are indexes in the CAS related to indexes in
  CasViews?  If we're ok with maybe changing this again in the next
release, I'm ok with starting like this.



This proposal only has indexed in CasViews, not indexes that belong
directly to the CAS. (Unless you meant the deprecated index-access
methods on CAS that use the current view.)

I would also like to figure out if there should be such a thing as
indexes that belong directly to the CAS (global indexes?), but it
seemed like we were too far from a consensus on that to get anything
done for 2.1.

We can always add additional methods to CAS (i.e.
getGlobalIndexRepository()) in a later version if we decide that's
right.  And we probably can't redefine any of the existing indexing
methods on CAS without breaking a lot of code anyway.  So it smells
like starting with this proposal will not get in the way of future
enhancements.

-Adam


Re: CAS and CasView - concrete proposal

2007-01-02 Thread Adam Lally

On 1/2/07, Marshall Schor [EMAIL PROTECTED] wrote:

I think this proposal also has one set of index definitions, and each view
gets its own private set of index-instances for these definitions.



Correct.



Will the methods not really associated with a CAS object (they are or
could be
static methods) still be on the CAS or CommonCas:

 createFilteredIterator, getConstraintFactory, createFeaturePath,
createFeatureValuePath, and fs2listIterator



I'm not sure they can be static, as they may depend on the type
system.  Some of them, anyway.  I think these should still be on
(Common)CAS, but might also be on CasView for convenience.  This is
seeming like a slippery slope, though, pretty soon everything is in
two places.



I suggest that the methods that belong in the CasView be left there
(deprecated)
to operate on the current view:
 get/set for Sofa things like DocumentText, SofaDataURI, etc.,
 getSofa
 getIndexRepository
 getAnnotationIndex(type)
 get/set associated with DocumentAnnotation (I think there one of
these per view - agree?)
 add/removeFsTo/FromIndexes



Agree.



For things like createView and getView(String or FS) - I'm ok with
requiring these to be on the CAS Api only, but
also wouldn't object if they were on the CasView API for convenience.



I prefer leaving them off the CasView.



The CAS has a getLowLevelCAS() method; the low level CAS includes both
things for FSs and also for IndexRepositories.
The index repository things should be looked at carefully to see if they
should go with the view (with perhaps convenience functions working
on the current view in the CAS Api).



Good point... we may need a LowLevelCasView.  (Currently the situation
with LowLevelCas is the same as with CAS - an instance of LowLevelCas
could either be referring to a view or to the base CAS.)

And don't forget JCas, where we'll need a JCasView.

When I get a minute I may try to compile a complete list of proposed
changes, maybe on the Wiki.



Finally :-) we have getSofaIterator...  on the CAS Api (I'm not
distinguishing between CAS and CommonCas APIs here).
I suggest for 2.1 we lock the association between 1 view == 1 sofa.
So we won't need getViewIterator, nor have to
figure out how to name views separately from Sofas.  Is that too
restrictive?



I agree, Sofas and Views are still 1-1 for the time being.  And
getSofaIterator would only be on CAS, not CasView.

-Adam


Re: CAS and CasView - concrete proposal

2007-01-02 Thread Thilo Goetz

Marshall Schor wrote:
snip
Will the methods not really associated with a CAS object (they are or 
could be

static methods) still be on the CAS or CommonCas:

createFilteredIterator, getConstraintFactory, createFeaturePath, 
createFeatureValuePath, and fs2listIterator


I suggest that the methods that belong in the CasView be left there 
(deprecated)
to operate on the current view: get/set for Sofa things like 
DocumentText, SofaDataURI, etc.,

getSofa
getIndexRepository
getAnnotationIndex(type)
get/set associated with DocumentAnnotation (I think there one of 
these per view - agree?)

add/removeFsTo/FromIndexes

Adam suggested that the methods that belong in the CAS for creating 
FeatureStructures and get/setting their fields be

made to work in the CasView API, for convenience; I agree with that.

For things like createView and getView(String or FS) - I'm ok with 
requiring these to be on the CAS Api only, but

also wouldn't object if they were on the CasView API for convenience.


The time to add convenience methods is either a) never, or b) when the 
current APIs have been found inconvenient.  Let's figure out where 
things belong first, and gather some experience with the setup.  Then, 
if things are so inconvenient that we need to sacrifice some conceptual 
clarity, let's by all means introduce some carefully selected 
convenience functions.  I vote for no convenience functions regarding 
CAS/CasView functionality in this release.  Let's get people's heads 
wrapped around the concepts first before we start to muddle them with 
convenience functions.


--Thilo



Re: CAS and CasView - concrete proposal

2007-01-02 Thread Thilo Goetz

Adam Lally wrote:

On 1/2/07, Marshall Schor [EMAIL PROTECTED] wrote:

snip

The CAS has a getLowLevelCAS() method; the low level CAS includes both
things for FSs and also for IndexRepositories.
The index repository things should be looked at carefully to see if they
should go with the view (with perhaps convenience functions working
on the current view in the CAS Api).



Good point... we may need a LowLevelCasView.  (Currently the situation
with LowLevelCas is the same as with CAS - an instance of LowLevelCas
could either be referring to a view or to the base CAS.)


Unfortunately, the low-level CAS is missing the base CAS functionality. 
 All the sofa/view stuff was implemented at the CAS level only.  This 
is something that should be fixed.


--Thilo



Re: CAS and CasView - concrete proposal

2007-01-02 Thread Marshall Schor

Thilo Goetz wrote:
snipUnfortunately, the low-level CAS is missing the base CAS 
functionality.  All the sofa/view stuff was implemented at the CAS 
level only.  This is something that should be fixed.


My understanding of the low-level interfaces is that they are there to 
support the no-Java-object-for-CAS-object scenario, representing CAS 
objects as ints.


The ll (low level) APIs come in checking and non-checking versions too.

The Sofa/View stuff would need to be implemented as low-level I think 
only if we conclude there is a need for a no-Java-object-for-CAS-object 
scenarios here.  By this I mean things like getView(SofaFS) - we could 
imagine a version of this which instead of taking a Java cover object 
(JCas or CAS) for the Sofa instance in the CAS, it would take an int.  
I don't think this is really needed, though, because I have a hard time 
imagining the use case where this would make an observable difference. 
(but maybe I'm wrong here).


Sofa/View stuff would need to be there to allow the ll APIs to work on 
the things they were designed for such as creating new Feature 
Structures, iterating over them, etc, but in a View.


-Marshall


CAS and CasView - concrete proposal

2006-12-29 Thread Adam Lally

Well, the concrete may not have quite set yet... but here goes:

1.  Goals

The following are confusing (or some might say, broken)

(a) the interface CAS can be an interface to either the whole CAS or to a
view.  Methods like this are poor:
CAS view = cas.getView(name);

(b) the logic determining which CAS (a view or the whole CAS) gets
passed to an
annotator's process method is needlessly complicated.

We would like to improve this in v2.1, so we have 3 weeks (starting
now) to implement it.  It's acceptable if what we do breaks multi-view
annotators/applicatoins, but it cannot break single-view
annotators/applications.

We want whatever we do to be easier to document and explain to users
than what we currently have.


2. Proposed Solution

We don't plan to change the fundamental design of views at this point
- there isn't time and it's too controversial.  A view still consists
of an index repository and a Sofa.  (Yes, I know someday a view may
not have a Sofa - but for now, it does.)

A. New CasView interface

We create a new interface CasView.  All of the CAS.getView() methods
will now return type CasView (instead of CAS).

The CasView interface will contain all of the sofa-access methods and
indexing-related methods that are on the CAS interface.

A more controversial question is whether you can create FS from a
CasView - i.e., does the method CasView.createFS(Type) exist?  What
about CasView.createAnnotation(Type, int begin, int end)?

In some previous discussions we said no, FS creation is on the CAS
only.  This communicates to the user that, logically, FS creation is
an operation on the CAS as a whole.  However after thinking about this
more I think that may be too inconvenient.  If I have a handle to a
view and want to create an FS I'd have to do:

myView.getCAS().createFS(type);

which is a little tedious. (and this assumes we have CasView.getCAS(),
without which it is much worse).  And what about annotations, which
need a Sofa reference, so we would need something like:

myView.getCAS().createAnnotation(type, myView.getSofa, begin, end)

which is too ugly to consider.

All in all I think it would be better to allow FS creation of the
CasView interface as well as the CAS interface.  I think we can
explain this.  A view is a window into a CAS - a particular way of
looking at it - it should be a fully functional interface for
interacting with the CAS from that viewpoint.  And that would include
creating new FS.



B. Backwards Compatibility

To meet the goal of being compatible with single-view annotators, we
will use the following strategy:

The idea is that a CAS has a current view.  Any methods on the CAS
that are view-oriented will apply to the current view.  This includes
but is not limited to:
getSofa()
getDocumentText()
getIndexRepository()
addFsToIndexes()
createAnnotation(int begin, int end) //needs to know which Sofa to refer to


The current view is determined by the framework and can be different
for different annotators.  For single-sofa annotators the current view
is the view that the annotator should process, as determined by sofa
mappings in the usual way.

Note that this approach also allows single-sofa application code to
work.  We have a lot of code that does:
AnalysisEngine ae = ...
CAS cas = ae.newCAS();
cas.setDocumentText(someString);
ae.process(cas);

and it would be really nice if this continues to work.

We could deprecate these APIs and encourage people to switch to the
view-oriented APIs, which would be something like:
AnalysisEngine ae = ...
CAS cas = ae.newCAS();
CasView initialView = cas.getInitialView();
initialView.setDocumentText(someString);
ae.process(cas);


Re: CAS and CasView - concrete proposal

2006-12-29 Thread Marshall Schor
More on hierarchies of implementation objects, and saving the user from 
writing dereferencing chains:


Suppose we divide the CAS methods into those which would just not make sense
on the CasView API, and others.  In the same spirit of pleasing the 
users by avoiding what they
could see as unnecessary dereferencing, would we make *all* the others 
also available via delegation to

the CasView API?

In other words, how many dereferencings would we want to shave off for 
users who wanted to get Type?

Would they write:

aCasView.getCas().getTypeSystem().getType(String) or
aCasView.getTypeSystem().getType(String) or
aCasView.getType(String)?

And for indexes:

aCasView.getIndexRepository().getIndex(name, aType); or
aCasView.getIndex(name, aType);

Users have often requested the shorter forms - they seem to often 
consider how the impl has organized
these things into object hierarchies as an implementation detail they'd 
rather not be bothered with. 
JCas (for better or worse) has some of these shorter forms (for getting 
indexes, for instance).


-Marshall

Adam Lally wrote:

Well, the concrete may not have quite set yet... but here goes:

1.  Goals

The following are confusing (or some might say, broken)

(a) the interface CAS can be an interface to either the whole CAS or 
to a

view.  Methods like this are poor:
CAS view = cas.getView(name);

(b) the logic determining which CAS (a view or the whole CAS) gets
passed to an
annotator's process method is needlessly complicated.

We would like to improve this in v2.1, so we have 3 weeks (starting
now) to implement it.  It's acceptable if what we do breaks multi-view
annotators/applicatoins, but it cannot break single-view
annotators/applications.

We want whatever we do to be easier to document and explain to users
than what we currently have.


2. Proposed Solution

We don't plan to change the fundamental design of views at this point
- there isn't time and it's too controversial.  A view still consists
of an index repository and a Sofa.  (Yes, I know someday a view may
not have a Sofa - but for now, it does.)

A. New CasView interface

We create a new interface CasView.  All of the CAS.getView() methods
will now return type CasView (instead of CAS).

The CasView interface will contain all of the sofa-access methods and
indexing-related methods that are on the CAS interface.

A more controversial question is whether you can create FS from a
CasView - i.e., does the method CasView.createFS(Type) exist?  What
about CasView.createAnnotation(Type, int begin, int end)?

In some previous discussions we said no, FS creation is on the CAS
only.  This communicates to the user that, logically, FS creation is
an operation on the CAS as a whole.  However after thinking about this
more I think that may be too inconvenient.  If I have a handle to a
view and want to create an FS I'd have to do:

myView.getCAS().createFS(type);

which is a little tedious. (and this assumes we have CasView.getCAS(),
without which it is much worse).  And what about annotations, which
need a Sofa reference, so we would need something like:

myView.getCAS().createAnnotation(type, myView.getSofa, begin, end)

which is too ugly to consider.

All in all I think it would be better to allow FS creation of the
CasView interface as well as the CAS interface.  I think we can
explain this.  A view is a window into a CAS - a particular way of
looking at it - it should be a fully functional interface for
interacting with the CAS from that viewpoint.  And that would include
creating new FS.



B. Backwards Compatibility

To meet the goal of being compatible with single-view annotators, we
will use the following strategy:

The idea is that a CAS has a current view.  Any methods on the CAS
that are view-oriented will apply to the current view.  This includes
but is not limited to:
getSofa()
getDocumentText()
getIndexRepository()
addFsToIndexes()
createAnnotation(int begin, int end) //needs to know which Sofa to 
refer to



The current view is determined by the framework and can be different
for different annotators.  For single-sofa annotators the current view
is the view that the annotator should process, as determined by sofa
mappings in the usual way.

Note that this approach also allows single-sofa application code to
work.  We have a lot of code that does:
AnalysisEngine ae = ...
CAS cas = ae.newCAS();
cas.setDocumentText(someString);
ae.process(cas);

and it would be really nice if this continues to work.

We could deprecate these APIs and encourage people to switch to the
view-oriented APIs, which would be something like:
AnalysisEngine ae = ...
CAS cas = ae.newCAS();
CasView initialView = cas.getInitialView();
initialView.setDocumentText(someString);
ae.process(cas);






Re: CAS and CasView - concrete proposal

2006-12-29 Thread Adam Lally

On 12/29/06, Marshall Schor [EMAIL PROTECTED] wrote:

snip/
It seems to me you will need a CasViewImpl class - this is for the use
case where the user
wants to, e.g., run two iterators together, one iterating over one view,
while the other goes over
another view.

The actual objects that implement CAS views could be tiny - just a
specification of which view it was,
plus a ref to the shared CASImpl.  Is this what you're thinking?



Details... Yes, I think three would need to be a CasViewImpl.  I don't
have strong feelings about how to implement it.  But unless there are
reasons to change things, I think we may want to stick close to how
views are currently implemented, which is that the CasViewImpl owns
its index repository and also has a direct reference to the heap and
other internal parts of the CAS.  These references save having to make
an extra method call to implement methods such as createFS.


Also, I forgot to mention in the proposal - I think we'll need to have
a JCasView as well.  Some thought should be put into how this relates
to your recent refactoring that produced CommonCas... we may need a
CommonCasView.  But I think we can work these details out if the
general ideas in my proposal are acceptable.

-Adam


Re: CAS and CasView - concrete proposal

2006-12-29 Thread Thilo Goetz
So your proposal is to leave things as they are, except that we call 
some of the things that we used to call a CAS a CasView.  We're not 
going to touch how indexing works, at least conceptually.  We could 
implement this proposal by simply making the CASImpl class implement the 
CasView interface and we would be more or less done.


Is that a correct interpretation, or did I miss something?

--Thilo