Re: A proposal for a slight augmentation of aggregate component descriptors

Marshall Schor Fri, 27 Sep 2013 07:19:49 -0700

On 9/27/2013 9:15 AM, Burn Lewis wrote:
> In the GALE project we relied on descriptor editing, removing unwanted
> delegates and their flow element, but in the Watson project we have many
> nested aggregates and removing a delegate that has a parameter overridden
> by its parent aggregate may require many edits.  Hence the advantage of a
> solution built in to UIMA.
>
> Selectively enabling/disabling individual delegates is taking away
> responsibility that I believe is better left solely to the flow
> controller.  It should be the master, telling the framework which delegates
> to initialize and which to ignore, providing a single location to define
> the complete flow.  Custom flow controllers can already read a flow
> definition from a parameter, external or internal, and we could add a way
> to parametrize the flow constraints ... perhaps as a new type that accepts
> a list of delegates or external parameters containing lists.
>
>   <flowConstraints>
>    <flowList>
>      alpha, beta, ${subFlow1},
>      gamma
>      ${subflow2}
>      ...
>    </flowList>
>   </flowConstraints>
<flowConstraints> are currently "optional" (except for the built-in flow
controllers).
<flowConstraints> are not powerful enough to specify arbitrary flows.  (Consider
flows that a custom flow controller might do depending on some data it finds in
the CAS. )


There is a small difference between skipping / picking an alternative for
delegate, and changing the flow specification, because the flow controller can
route CASes multiple times to the same delegate.   The use case postulated was
to have all flow routings that might reach a particular delegate either pick
among alternatives for that delegate, or skip it.

I have heard the view that this new facility (of skipping some delegate or
picking among alternatives for some delegate) should have minimal impact on
existing flow controllers (for backward compatibility).  I think this would be
possible, for a design which specified delegate alternatives or skipping; for
alternatives, it would be picking among delegates having the same key name, one
of those alternatives.  For skipping - it would still populate the internal maps
with that key name (so the existing flow controller implementations would
continue to find the delegate) but the actual delegate would be some form of a
no-op (including both process and initialize calls).  This approach would make
it more likely that existing flow controller implementations would continue to
work (except that alternative would be picked for some delegates, and/or some
would be skipped).

-Marshall
>
> ~Burn
>
>
> On Fri, Sep 27, 2013 at 8:45 AM, Marshall Schor <[email protected]> wrote:
>
>> On 9/26/2013 5:39 PM, Richard Eckart de Castilho wrote:
>>> On 26.09.2013, at 23:28, Marshall Schor <[email protected]> wrote:
>>>
>>>> I think there's a tradeoff when using Specifications - they're more
>> clear when
>>>> they have the information locally, and harder to understand when they
>> point to
>>>> an unknown arbitrary thing.
>>> It is interesting you mention this, because the documentation clearly
>> states
>>> "As with the delegateAnalysisEngine element, the flowController element
>> may contain either a complete flowControllerDescription or an import, but
>> the import is recommended."
>>> (Source:
>> http://uima.apache.org/d/uimaj-2.4.2/references.html#ugr.ref.xml.component_descriptor.flow_controller
>> )
>>> Your statement also appears to contradict the idea of a configuration of
>> a specifier via external variables in the first place, as these contain
>> information that is not locally available.
>> Yes, it appears this way.  This is why I see these kinds of design choices
>> as
>> shades of grey, because there are arguments on both sides of many issues,
>> and
>> the art seems to be in finding pragmatic compromise choices, driven by
>> actual
>> use cases.
>>
>> We initially did not have external variables; but as UIMA use became more
>> widespread, the users started asking for things along these lines, with
>> clear
>> reasons.
>>>> Generally, the UIMA spec design philosophy have tried to encourage
>> community and
>>>> part-interoperability by leaning toward making things more transparent
>> / obvious.
>>> I don't understand this statement. How does the community come in here?
>> Community means to have an active and widespread ecosystem containing
>> component
>> developers, component assemblers, experimenters of all kinds, commercial
>> product
>> developers, and users of all kinds.  These people bring widely different
>> skill-sets with them, and we want to enable the wider community being able
>> to
>> build upon one another's work, successfully.
>>
>> -Marshall
>>> I see how type system specifiers help interoperability, but I don't
>> actually see this too much for component specifiers. They appear to be more
>> deployment specifiers than anything else.
>>> -- Richard
>>>
>>>> -Marshall
>>>> On 9/26/2013 5:06 PM, Richard Eckart de Castilho wrote:
>>>>> Another alternative could even be to control the import to point the
>> desired flow:
>>>>> <flowController key="[String]">
>>>>>    <import location="${xxx}"/>
>>>>> </flowController>
>>>>>
>>>>> That would completely remove the need for any skipping attributes and
>> work without dynamically generated descriptors.
>>>>> -- Richard
>>>>>
>>>>> On 26.09.2013, at 23:00, Richard Eckart de Castilho <[email protected]>
>> wrote:
>>>>>> Not the controller, but its configuration. The skipping is clearly
>> affecting the flow. So why not add something to the flowConstraints, e.g.:
>>>>>> <flowConstraints>
>>>>>> <fixedFlow>
>>>>>>   <node>[String]</node>
>>>>>>   <node>[String]</node>
>>>>>>   ...
>>>>>> </fixedFlow>
>>>>>> <skip>
>>>>>>   <node>[String]</node>
>>>>>>   <node>[String]</node>
>>>>>>   ...
>>>>>> </skip>
>>>>>> </flowConstraints>
>>>>>>
>>>>>> or
>>>>>>
>>>>>> <flowConstraints>
>>>>>> <fixedFlow>
>>>>>>   <node skip="true">[String]</node>
>>>>>>   <node>[String]</node>
>>>>>>   ...
>>>>>> </fixedFlow>
>>>>>> </flowConstraints>
>>>>>>
>>>>>> Personally, I'd not make any modifications to the descriptor at all,
>> but rather would just skip the delegate when programmatically creating the
>> descriptor. We do that all the time in our experiments. But if that is for
>> some reason not an option and the extension is a strong requirement, the
>> change should at least be made at the location that conceptually makes most
>> sense (imho).
>>>>>> @Marshall: do you want to provide some more background why you do not
>> simply create the descriptors programmatically and externalize this
>> skipping, including, etc. into your experimental setup?
>>>>>> -- Richard
>>>>>>
>>>>>> On 26.09.2013, at 22:53, Peter Klügl <[email protected]>
>> wrote:
>>>>>>> Am 26.09.2013 22:51, schrieb Richard Eckart de Castilho:
>>>>>>>> I believe this is a concern of the flow controller and should not
>> be configured on the delegates, but rather within the flow controller
>> configuration.
>>>>>>> That was also my first guess, but do you really wanna touch or
>> change the flow controller for just skipping a component?
>>>>>>> Peter
>>>>>>>
>>>>>>>> -- Richard
>>>>>>>>
>>>>>>>> On 26.09.2013, at 17:23, Marshall Schor <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> To handle the use cases briefly described on the user list for
>> selectively
>>>>>>>>> skipping some annotators in an aggregate, based on some externally
>> supplied
>>>>>>>>> configuration data, I'd like to propose something along these
>> lines:
>>>>>>>>> * Add to the existing element <delegateAnalysisEngine
>> key="[String]"> one or two
>>>>>>>>> additional attributes.  One would be "skip=${xxx}" and the other
>> would be its
>>>>>>>>> inverse (for improved readability, only, not logically needed):
>> "run=${xxx}",
>>>>>>>>> where the value of the parameter would need to be "true" or
>> "false" (or "yes" or
>>>>>>>>> "no").
>>>>>>>>>
>>>>>>>>> The parameter could be written literally as "true", etc., but also
>> could be
>>>>>>>>> written using the standard variable naming syntax used elsewhere
>> in the
>>>>>>>>> descriptors, and would be resolved by settings in the now-standard
>> "external
>>>>>>>>> overrides" files used by UIMA.  This means that the external
>> overrides would
>>>>>>>>> continue to be a place where all of the specific configuration
>> info for a
>>>>>>>>> particular "run" could be placed, together.
>>>>>>>>>
>>>>>>>>> The implementation would do nothing new if the parameters were
>> indicating to run
>>>>>>>>> the delegate, but if they were indicating it should be skipped or
>> not run, then
>>>>>>>>> the effect would be as if the delegate had been edited out of the
>> xml descriptor.
>>>>>>>>> This would satisfy some pleas from some user groups for help in
>> managing their
>>>>>>>>> descriptors across various related experiments.
>>>>>>>>>
>>>>>>>>> An example: a user might have a delegate which came in two forms:
>> one to run
>>>>>>>>> "locally", and the other to run "remote".
>>>>>>>>>
>>>>>>>>> They could then include both descriptors in the aggregate, and
>> have only one of
>>>>>>>>> them "active", by coding:
>>>>>>>>>
>>>>>>>>> <delegateAnalysisEngine key="NE-detector"
>>  run="NE-Detector-local"> ...
>>>>>>>>> </delegateAnalysisEngine>
>>>>>>>>> <delegateAnalysisEngine key="NE-detector"
>> skip="NE-Detector-local"> ...
>>>>>>>>> </delegateAnalysisEngine>
>>>>>>>>>
>>>>>>>>> WDYT?
>>>>>>>>>
>>>>>>>>> -Marshall
>>

Re: A proposal for a slight augmentation of aggregate component descriptors

Reply via email to