Re: Singleton FSs, again
Adam Lally wrote: That approach is too brittle for my taste. An annotator writer would declare a type that is meant to be a singleton, but there's no way to enforce this. One careless annotator that creates a second instance of such a type, and the whole analysis chain stops working. With my approach, at least the bar is a bit higher. Well, with FsVariables, one careless annotator can still set the variable to a new value and break downstream annotators. There can even be name conflicts with two annotators trying to use the same variable name for different things. It's OK with me to not implement my suggestion because it encourages annotator developers to rely on undeclared assumptions (that only a single instance of a type exists) - basically the same criticism I had of the FsVariable proposal. In that case let's leave things the way they are. I don't think this is a pressing problem that needs to be addressed. -Adam Right, it is important to me, though. I'll put it in the sandbox as an external tool that people can optionally use. It won't be pretty, as annotators that want to use the facility need to declare the necessary type and index, but I guess that's ok. We'll see if people want this or not. --Thilo
Re: Singleton FSs, again
On 5/25/07, Thilo Goetz [EMAIL PROTECTED] wrote: I would like to revive the discussion that started with http://www.mail-archive.com/uima-dev@incubator.apache.org/msg01299.html I still have the same concern about this that I posted to the previous thread: I think Michael is onto the same point that concerns me. To use this feature, components have to agree on what variable names they are going to use. So we're creating another kind of dependency that I believe should be documented in the capabilities. Sure, people could build this themselves already, but if we make it built-in then we're strongly encouraging its use and should consider all the implications. If we had more expressive capability spec (so you could say I create/require an instance of type FsVariable with name=Foo) then that might be a way to go. Thilo replied: I guess we could do that in addition. How would you imagine this would work? I'm mainly addressing a practical problem here, and want to give people a viable alternative to modifying the DocumentAnnotation. I think this approach is fairly forward-compatible as well, in the sense that it can later be strengthened with descriptor-based integrity constraints. Actually I'm not happy about making the capabilities more complicated to handle this. I'm not sure the benefits of global variables outweigh either (a) making the capabilities more complicated or (b) adding/encouraging another set of implicit agreements between annotators that aren't declared anywhere. Let's go back and think about the DocumentAnnotation use case. Users can already declare their own document metadata type and add it to the indexes. Now that we have default bag indexes this is easy to do even if their document metadata type does not extend annotation. I think this is most of the way towards addressing the issue. What remains are (a) providing convenient access to a single indexed object, without going through an iterator, and (b) enforcing that there is only ever a singleton instance of a particular type. Another suggestion for addressing these issues: void CAS.indexSingleton(FeatureStructure aFS) throws CASException FeatureStructure CAS.getSingleton(Type aType) throws CASException The former is defined to throw an exception if the index over aFS.getType() is non-empty (for this view - we can have a separate singleton for each index repository - I think that is what we want for DocumentAnnotation), and otherwise to add aFS to the indexes. The latter is defined to throw an exception if there is not exactly one instance of aType in the indexes for this view, and otherwise to return the one instance. I like this better since it doesn't introduce yet another name space that annotators have to agree on amongst each other. -Adam
Re: Singleton FSs, again
Adam Lally wrote: On 5/25/07, Thilo Goetz [EMAIL PROTECTED] wrote: I would like to revive the discussion that started with http://www.mail-archive.com/uima-dev@incubator.apache.org/msg01299.html I still have the same concern about this that I posted to the previous thread: I think Michael is onto the same point that concerns me. To use this feature, components have to agree on what variable names they are going to use. So we're creating another kind of dependency that I believe should be documented in the capabilities. Sure, people could build this themselves already, but if we make it built-in then we're strongly encouraging its use and should consider all the implications. If we had more expressive capability spec (so you could say I create/require an instance of type FsVariable with name=Foo) then that might be a way to go. Thilo replied: I guess we could do that in addition. How would you imagine this would work? I'm mainly addressing a practical problem here, and want to give people a viable alternative to modifying the DocumentAnnotation. I think this approach is fairly forward-compatible as well, in the sense that it can later be strengthened with descriptor-based integrity constraints. Actually I'm not happy about making the capabilities more complicated to handle this. I'm not sure the benefits of global variables outweigh either (a) making the capabilities more complicated or (b) adding/encouraging another set of implicit agreements between annotators that aren't declared anywhere. Let's go back and think about the DocumentAnnotation use case. Users can already declare their own document metadata type and add it to the indexes. Now that we have default bag indexes this is easy to do even if their document metadata type does not extend annotation. I think this is most of the way towards addressing the issue. What remains are (a) providing convenient access to a single indexed object, without going through an iterator, and (b) enforcing that there is only ever a singleton instance of a particular type. Another suggestion for addressing these issues: void CAS.indexSingleton(FeatureStructure aFS) throws CASException FeatureStructure CAS.getSingleton(Type aType) throws CASException The former is defined to throw an exception if the index over aFS.getType() is non-empty (for this view - we can have a separate singleton for each index repository - I think that is what we want for DocumentAnnotation), and otherwise to add aFS to the indexes. The latter is defined to throw an exception if there is not exactly one instance of aType in the indexes for this view, and otherwise to return the one instance. I like this better since it doesn't introduce yet another name space that annotators have to agree on amongst each other. That approach is too brittle for my taste. An annotator writer would declare a type that is meant to be a singleton, but there's no way to enforce this. One careless annotator that creates a second instance of such a type, and the whole analysis chain stops working. With my approach, at least the bar is a bit higher. --Thilo -Adam
Re: Singleton FSs, again
That approach is too brittle for my taste. An annotator writer would declare a type that is meant to be a singleton, but there's no way to enforce this. One careless annotator that creates a second instance of such a type, and the whole analysis chain stops working. With my approach, at least the bar is a bit higher. Well, with FsVariables, one careless annotator can still set the variable to a new value and break downstream annotators. There can even be name conflicts with two annotators trying to use the same variable name for different things. It's OK with me to not implement my suggestion because it encourages annotator developers to rely on undeclared assumptions (that only a single instance of a type exists) - basically the same criticism I had of the FsVariable proposal. In that case let's leave things the way they are. I don't think this is a pressing problem that needs to be addressed. -Adam
Re: Singleton FSs, again
Eddie Epstein wrote: On 5/25/07, Thilo Goetz [EMAIL PROTECTED] wrote: Technically, the proposal consists of a new built-in type and new built-in index as follows. - a type uima.cas.FsVariable that inherits from uima.cas.TOP with features name:String, type:String and value:TOP. The feature called value, as type TOP, can only hold a reference to another FS. So, it is not possible to create an FsVariable with a double valued feature, or with an Integer feature, etc. As you say, it is an FS variable. I'm still a little fuzzy about the scope of what can be done with this proposal. Looking back at the previous discussion, Adam said: I think one use case is the singleton use case. You could define a global variable called myapp.documentMetadata and set its value to an instance of myTypeSystem.DocumentMetadata. Then all your annotators could access it by getting the value of this global variable. The application has to create a custom type, myTypeSystem.DocumentMetadata, which is good because it is documented in the descriptors. So the FsVariable is a mechanism to get to the single instance of a custom type. I admit that the alternative of creating a custom set index for a type is a bit much for most users. Yes, that is *the* use case. People want to create document metadata. They often do this by adding new features to DocumentAnnotation, which leads to problems, mainly in conjunction with the JCas (because everybody creates a different cover class for the DocumentAnnotation, but only one of those cover classes is actually loaded; this problem will not go away until we have a separate class loader for each annotator. We're one step closer to at least being able to have that with Marshall's ongoing work on JCas class loading) Jorn said: Imagine an Annotator which is a spam filter, it has to put a tag to the CAS which say spam or no_spam. The document language is also an example for a global variable. These examples would not be covered because the FsVariable can only point to another FS, not hold an arbitrary value; correct? True, but Joern didn't really say he was talking about string values. You could also imagine having a DocumentLanguage FS that holds one or more string valued features. - a built-in set index over FsVariable, sorted by the name feature. The APIs to define and access these critters would look as follows. // Declare a new global variable/singleton FS declareFsVariable(String name, Type type) What happens when a variable is declared? First we check if a FSVariable object with the given name already exists in the FSVariable index. If it does, we throw an exception. If not, we create a new FSVariable with the name feature set to the name parameter, the type feature set to type.getName() and value set to null. This we put in the index. // Check if a variable of that name exists isFsVariable(String name):boolean // Get the value of variable getFsVariableType(String name):Type This just returns the String value for Type, yes? Not sure what you mean. It checks if a variable with name exists, and if yes, returns the type object corresponding to the type feature of the FSVariable. If no such variable exists, either return null or throw an exception (I'd favor the latter in this case). // Get all variables of a given type listFsVariables(Type type):List What exactly is the List returned? A list of Strings, containing all names of FSVariables declared for the input type. And finally, looks like I missed a couple of pretty crucial methods. // Retrieve a certain FSVariable value. May return null. getVariableValue(String name):FeatureStructure // Set a variable value. setVariable(String name, FeatureStructure fs):void Those also would throw an exception if the variable did not exist, or, in the latter case, was of the wrong type. Thanks, Eddie