Richard Eckart de Castilho created UIMA-2812:
------------------------------------------------
Summary: Support ResultSpecification
Key: UIMA-2812
URL: https://issues.apache.org/jira/browse/UIMA-2812
Project: UIMA
Issue Type: New Feature
Components: uimaFIT
Reporter: Richard Eckart de Castilho
Provide support for controlling the output of a component using a
ResultSpecification. Consider the e.g. use-case that a component can produce a
"PartOfSpeech" annotation, but it should not, because another component in the
same pipeline has already produced that or will later produce it. Here some
pseudocode:
{noformat}
AnalysisEngineDescription aed = createPrimitiveDescription(Parser.class);
// Tell Parser not to produce PartOfSpeech annotations
ResultUtil.removeType(aed, PartOfSpeech.class);
{noformat}
*How to "remove" a type?* UIMA requires that a ResultSpecification contains
_all_ the types that the component produces, which would normally requiring to
add all types except the ones that should not be produced. uimaFIT has access
to _capability_ annotations, which it could use to pre-fill a result
specification with all the types that a component could produce, allowing the
user to conveniently remove the ones not required.
*How to transport the information?* Unfortunately, there appears to be no way
to store the ResultSpecification as part of an _AnalysisEngineDescription_. As
far as I can see, UIMA has two ways to control the ResultSpecification for a
component:
* via the components _capabilities_
* via a parameter passed to the _AnalysisEngine.process_ method (or via
_setResultSpecification_)
There are two scenarios I can imagine:
* _at description time_: changes to the result specification are added to the
descriptor.
** Add the ResultSpecification to the component descriptor -- unfortunately is
not supported by UIMA.
** Change the _capabilities_. E.g. uimaFIT creates an AE descriptor with the
capabilities filled in, then one could add or remove types/features there.
* _at runtime_: uimaFIT could be used to acquire an initial ResultSpecification
from the annotation on the AE class, which can then be modified to add/remove
types/features. The final specification needs to be passed in some way into the
pipeline execution code
** _along with the component descriptor_: pairs of {descriptor, resultspec}
needed to be passed to the pipeline execution code (e.g. SimplePipeline),
making the API more complex.
** _as part of already instantiated components_: in case of SimplePipeline,
there are also non-descriptor-based methods that could be used, in which case
the result specifications could be set on each component individually before
passing them into the pipeline code.
*Does it fit into the uimaFIT concept?* So far, it was possible to implement
uimaFIT in such a way that all information pertaining to the component
configuration could be reflected, configured, and stored in a descriptor, so
that any UIMA execution engine could then pick up the descriptor and execute
the component as it was configured. UIMA appears to be lacking the concept of a
ResultSpecification as part of the descriptors. In particular, that seems to
affect ability to configure results within aggregate analysis engines.
*Conclusion* Since a ResultSpecification cannot be stored in a descriptor, the
next best thing appears to be adding some convenience methods to change the
reflected capabilities in the descriptor.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira