Re: Singleton FSs, again

2007-05-30 Thread Thilo Goetz
Adam Lally wrote:
 That approach is too brittle for my taste.  An annotator writer would
 declare a type that is meant to be a singleton, but there's no way to
 enforce this.  One careless annotator that creates a second instance of
 such a type, and the whole analysis chain stops working.  With my
 approach,
 at least the bar is a bit higher.

 
 Well, with FsVariables, one careless annotator can still set the
 variable to a new value and break downstream annotators.  There can
 even be name conflicts with two annotators trying to use the same
 variable name for different things.
 
 It's OK with me to not implement my suggestion because it encourages
 annotator developers to rely on undeclared assumptions (that only a
 single instance of a type exists) - basically the same criticism I had
 of the FsVariable proposal.  In that case let's leave things the way
 they are.  I don't think this is a pressing problem that needs to be
 addressed.
 
 -Adam

Right, it is important to me, though.  I'll put it in the sandbox as an
external tool that people can optionally use.  It won't be pretty, as
annotators that want to use the facility need to declare the necessary
type and index, but I guess that's ok.  We'll see if people want this
or not.

--Thilo



Re: Singleton FSs, again

2007-05-29 Thread Adam Lally

On 5/25/07, Thilo Goetz [EMAIL PROTECTED] wrote:

I would like to revive the discussion that started with
http://www.mail-archive.com/uima-dev@incubator.apache.org/msg01299.html



I still have the same concern about this that I posted to the previous thread:

   I think Michael is onto the same point that concerns me.  To use this
   feature, components have to agree on what variable names they are
   going to use.  So we're creating another kind of dependency that I
   believe should be documented in the capabilities.  Sure, people could
   build this themselves already, but if we make it built-in then we're
   strongly encouraging its use and should consider all the implications.
   If we had more expressive capability spec (so you could say I
   create/require an instance of type FsVariable with name=Foo) then
   that might be a way to go.

Thilo replied:

I guess we could do that in addition. How would you imagine this would work?



I'm mainly addressing a practical problem here, and want to give

people a viable

alternative to modifying the DocumentAnnotation. I think this

approach is fairly

forward-compatible as well, in the sense that it can later be

strengthened with

descriptor-based integrity constraints.



Actually I'm not happy about making the capabilities more complicated
to handle this.  I'm not sure the benefits of global variables
outweigh either (a) making the capabilities more complicated or (b)
adding/encouraging another set of implicit agreements between
annotators that aren't declared anywhere.

Let's go back and think about the DocumentAnnotation use case.  Users
can already declare their own document metadata type and add it to the
indexes.  Now that we have default bag indexes this is easy to do even
if their document metadata type does not extend annotation.

I think this is most of the way towards addressing the issue.  What
remains are (a) providing convenient access to a single indexed
object, without going through an iterator, and (b) enforcing that
there is only ever a singleton instance of a particular type.

Another suggestion for addressing these issues:
void CAS.indexSingleton(FeatureStructure aFS) throws CASException
FeatureStructure CAS.getSingleton(Type aType) throws CASException

The former is defined to throw an exception if the index over
aFS.getType() is non-empty (for this view - we can have a separate
singleton for each index repository - I think that is what we want
for DocumentAnnotation), and otherwise to add aFS to the indexes.

The latter is defined to throw an exception if there is not exactly
one instance of aType in the indexes for this view, and otherwise to
return the one instance.



I like this better since it doesn't introduce yet another name space
that annotators have to agree on amongst each other.

-Adam


Re: Singleton FSs, again

2007-05-29 Thread Thilo Goetz
Adam Lally wrote:
 On 5/25/07, Thilo Goetz [EMAIL PROTECTED] wrote:
 I would like to revive the discussion that started with
 http://www.mail-archive.com/uima-dev@incubator.apache.org/msg01299.html

 
 I still have the same concern about this that I posted to the previous
 thread:
 
I think Michael is onto the same point that concerns me.  To use this
feature, components have to agree on what variable names they are
going to use.  So we're creating another kind of dependency that I
believe should be documented in the capabilities.  Sure, people could
build this themselves already, but if we make it built-in then we're
strongly encouraging its use and should consider all the implications.
If we had more expressive capability spec (so you could say I
create/require an instance of type FsVariable with name=Foo) then
that might be a way to go.
 
 Thilo replied:
 I guess we could do that in addition. How would you imagine this would
 work?
 
 I'm mainly addressing a practical problem here, and want to give
 people a viable
 alternative to modifying the DocumentAnnotation. I think this
 approach is fairly
 forward-compatible as well, in the sense that it can later be
 strengthened with
 descriptor-based integrity constraints.
 
 
 Actually I'm not happy about making the capabilities more complicated
 to handle this.  I'm not sure the benefits of global variables
 outweigh either (a) making the capabilities more complicated or (b)
 adding/encouraging another set of implicit agreements between
 annotators that aren't declared anywhere.
 
 Let's go back and think about the DocumentAnnotation use case.  Users
 can already declare their own document metadata type and add it to the
 indexes.  Now that we have default bag indexes this is easy to do even
 if their document metadata type does not extend annotation.
 
 I think this is most of the way towards addressing the issue.  What
 remains are (a) providing convenient access to a single indexed
 object, without going through an iterator, and (b) enforcing that
 there is only ever a singleton instance of a particular type.
 
 Another suggestion for addressing these issues:
 void CAS.indexSingleton(FeatureStructure aFS) throws CASException
 FeatureStructure CAS.getSingleton(Type aType) throws CASException
 
 The former is defined to throw an exception if the index over
 aFS.getType() is non-empty (for this view - we can have a separate
 singleton for each index repository - I think that is what we want
 for DocumentAnnotation), and otherwise to add aFS to the indexes.
 
 The latter is defined to throw an exception if there is not exactly
 one instance of aType in the indexes for this view, and otherwise to
 return the one instance.
 
 
 
 I like this better since it doesn't introduce yet another name space
 that annotators have to agree on amongst each other.

That approach is too brittle for my taste.  An annotator writer would
declare a type that is meant to be a singleton, but there's no way to
enforce this.  One careless annotator that creates a second instance of
such a type, and the whole analysis chain stops working.  With my approach,
at least the bar is a bit higher.

--Thilo

 
 -Adam



Re: Singleton FSs, again

2007-05-29 Thread Adam Lally

That approach is too brittle for my taste.  An annotator writer would
declare a type that is meant to be a singleton, but there's no way to
enforce this.  One careless annotator that creates a second instance of
such a type, and the whole analysis chain stops working.  With my approach,
at least the bar is a bit higher.



Well, with FsVariables, one careless annotator can still set the
variable to a new value and break downstream annotators.  There can
even be name conflicts with two annotators trying to use the same
variable name for different things.

It's OK with me to not implement my suggestion because it encourages
annotator developers to rely on undeclared assumptions (that only a
single instance of a type exists) - basically the same criticism I had
of the FsVariable proposal.  In that case let's leave things the way
they are.  I don't think this is a pressing problem that needs to be
addressed.

-Adam


Re: Singleton FSs, again

2007-05-28 Thread Thilo Goetz

Eddie Epstein wrote:

On 5/25/07, Thilo Goetz [EMAIL PROTECTED] wrote:
Technically, the proposal consists of a new built-in type and new 
built-in

index as follows.

- a type uima.cas.FsVariable that inherits from uima.cas.TOP with 
features

  name:String, type:String and value:TOP.


The feature called value, as type TOP, can only hold a reference to
another FS. So, it is not possible to create an FsVariable with a
double valued feature, or with an Integer feature, etc. As you say, it
is an FS variable.

I'm still a little fuzzy about the scope of what can be done with this
proposal. Looking back at the previous discussion, Adam said:


I think one use case is the singleton use case.  You could define a
global variable called myapp.documentMetadata and set its value to
an instance of myTypeSystem.DocumentMetadata.  Then all your
annotators could access it by getting the value of this global
variable.


The application has to create a custom type,
myTypeSystem.DocumentMetadata, which is good because it is documented
in the descriptors. So the FsVariable is a mechanism to get to the
single instance of a custom type.

I admit that the alternative of creating a custom set index for a type
is a bit much for most users.


Yes, that is *the* use case.  People want to create document metadata.  They
often do this by adding new features to DocumentAnnotation, which leads to
problems, mainly in conjunction with the JCas (because everybody creates a
different cover class for the DocumentAnnotation, but only one of those cover
classes is actually loaded; this problem will not go away until we have a
separate class loader for each annotator.  We're one step closer to at least
being able to have that with Marshall's ongoing work on JCas class loading)



Jorn said:

Imagine an Annotator which is a spam filter, it has to
put a tag to the CAS which say spam or no_spam.

The document language is also an example for a global variable.


These examples would not be covered because the FsVariable can only
point to another FS, not hold an arbitrary value; correct?



True, but Joern didn't really say he was talking about string values.
You could also imagine having a DocumentLanguage FS that holds one or
more string valued features.


- a built-in set index over FsVariable, sorted by the name feature.

The APIs to define and access these critters would look as follows.

// Declare a new global variable/singleton FS
declareFsVariable(String name, Type type)


What happens when a variable is declared?


First we check if a FSVariable object with the given name already exists
in the FSVariable index.  If it does, we throw an exception.  If not, we
create a new FSVariable with the name feature set to the name parameter,
the type feature set to type.getName() and value set to null.  This we
put in the index.





// Check if a variable of that name exists
isFsVariable(String name):boolean

// Get the value of variable
getFsVariableType(String name):Type


This just returns the String value for Type, yes?


Not sure what you mean.  It checks if a variable with name exists, and
if yes, returns the type object corresponding to the type feature of the
FSVariable.  If no such variable exists, either return null or throw an
exception (I'd favor the latter in this case).





// Get all variables of a given type
listFsVariables(Type type):List


What exactly is the List returned?


A list of Strings, containing all names of FSVariables declared for 
the input type.


And finally, looks like I missed a couple of pretty crucial methods.

// Retrieve a certain FSVariable value.  May return null.
getVariableValue(String name):FeatureStructure 


// Set a variable value.
setVariable(String name, FeatureStructure fs):void

Those also would throw an exception if the variable did not exist, or, in
the latter case, was of the wrong type.



Thanks,
Eddie