Richard Eckart de Castilho created UIMA-2812:
------------------------------------------------

             Summary: Support ResultSpecification
                 Key: UIMA-2812
                 URL: https://issues.apache.org/jira/browse/UIMA-2812
             Project: UIMA
          Issue Type: New Feature
          Components: uimaFIT
            Reporter: Richard Eckart de Castilho


Provide support for controlling the output of a component using a 
ResultSpecification. Consider the e.g. use-case that a component can produce a 
"PartOfSpeech" annotation, but it should not, because another component in the 
same pipeline has already produced that or will later produce it. Here some 
pseudocode:

{noformat}
AnalysisEngineDescription aed = createPrimitiveDescription(Parser.class);
// Tell Parser not to produce PartOfSpeech annotations
ResultUtil.removeType(aed, PartOfSpeech.class);
{noformat}

*How to "remove" a type?* UIMA requires that a ResultSpecification contains 
_all_ the types that the component produces, which would normally requiring to 
add all types except the ones that should not be produced. uimaFIT has access 
to _capability_ annotations, which it could use to pre-fill a result 
specification with all the types that a component could produce, allowing the 
user to conveniently remove the ones not required.

*How to transport the information?* Unfortunately, there appears to be no way 
to store the ResultSpecification as part of an _AnalysisEngineDescription_. As 
far as I can see, UIMA has two ways to control the ResultSpecification for a 
component:

* via the components _capabilities_
* via a parameter passed to the _AnalysisEngine.process_ method (or via 
_setResultSpecification_) 

There are two scenarios I can imagine: 

* _at description time_: changes to the result specification are added to the 
descriptor.
** Add the ResultSpecification to the component descriptor -- unfortunately is 
not supported by UIMA.
** Change the _capabilities_. E.g. uimaFIT creates an AE descriptor with the 
capabilities filled in, then one could add or remove types/features there.
* _at runtime_: uimaFIT could be used to acquire an initial ResultSpecification 
from the annotation on the AE class, which can then be modified to add/remove 
types/features. The final specification needs to be passed in some way into the 
pipeline execution code
** _along with the component descriptor_: pairs of {descriptor, resultspec} 
needed to be passed to the pipeline execution code (e.g. SimplePipeline), 
making the API more complex.
** _as part of already instantiated components_: in case of SimplePipeline, 
there are also non-descriptor-based methods that could be used, in which case 
the result specifications could be set on each component individually before 
passing them into the pipeline code.

*Does it fit into the uimaFIT concept?* So far, it was possible to implement 
uimaFIT in such a way that all information pertaining to the component 
configuration could be reflected, configured, and stored in a descriptor, so 
that any UIMA execution engine could then pick up the descriptor and execute 
the component as it was configured. UIMA appears to be lacking the concept of a 
ResultSpecification as part of the descriptors. In particular, that seems to 
affect ability to configure results within aggregate analysis engines.

*Conclusion* Since a ResultSpecification cannot be stored in a descriptor, the 
next best thing appears to be adding some convenience methods to change the 
reflected capabilities in the descriptor.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to