Thanks, Richard, and Thilo.
To summarize (I hope correctly):
* Current UIMA descriptor is combining component defs, and configuration for
them
* Things relating to flow should be in the flow section (only)
* Better development approach: experiment outside of core
* For this use case, try other approaches
e.g. similar to what others have done in the 3 pprs referenced
* Avoid core changes which expand extension points and are invasive
- better to require new flow controller impls, vs. keep old design
backwards compatible
I think these are good points, so I'm withdrawing this proposal, and will work
with the users who asked for this on alternatives.
-Marshall
On 9/27/2013 5:06 PM, Richard Eckart de Castilho wrote:
> The XML descriptor for AEs is already serving two purposes, it describes
> a component, and it configures it. I do not think it should be further
> watered down by evolving into a DSL for workflows, in particular if that
> means that workflow logic is leaking outside the flow part of the descriptor.
> Combining your proposal with our other proposal regarding the embedding of an
> expression language in the external configuration mechanism, is further
> watering
> down the responsibility of the flow controller. What are currently clearly
> assigned responsibilities and a clear design becomes quite a muddy patch.
>
> I think it is a good idea to explore new and improved possibilities for flow
> descriptions. But since once a thing is in the core, it is unlikely to ever
> get out, I do not think research on how such a thing should be done should
> happen in the core. In particular, as long as changes to the core are under
> the heading of "must remain compatible" (cf. UIMA-2670, or other recent
> discussions on this list), I suggest that experimental extensions should be
> developed outside the core. There are already enough things in the core that
> should be addressed and cruft that could be removed or fixed up and better
> integrated.
>
> Since the modification you suggest appears to be possibly only with
> modifications to the core and since it ignores the encapsulation of
> flow logic within the flow descriptor, I suggest to try and approach the
> user requirements from a different angle which do not have these problems.
>
> The motivation for the suggested changes appear to be related to using
> UIMA setups for experimentation which requires extensive parametrization.
> There are many people who already do that. E.g. two of the publications
> presented
> during the UIMA workshop last Monday where highlighting different approaches
> of
> building high-level experimental workflows for UIMA pipelines:
>
> CSE Framework: A UIMA-based Distributed System for Configuration Space
> Exploration 14-17
> Elmer Garduno, Zi Yang, Avner Maiberg, Collin McCormack, Yan Fang, Eric Nyberg
> in http://ceur-ws.org/Vol-1038/
>
> Bluima: a UIMA-based NLP Toolkit for Neuroscience 34-41
> Renaud Richardet, Jean-Cédric Chappelier, Martin Telefont
> in http://ceur-ws.org/Vol-1038/
>
> and similarly, although from a different venue
>
> A lightweight framework for reproducible parameter sweeping in information
> retrieval
> Richard Eckart de Castilho, Iryna Gurevych,
> http://dl.acm.org/citation.cfm?id=2064248
>
> There may be additional work on building experiments (with UIMA) that
> should be considered, e.g. the original uimaFIT publications, since
> experimentation
> was one of the main considerations behind its development, although "E"
> didn't make
> it into the name.
>
> The flow controller offers a clear extension point for the kind of
> modifications
> you are trying to introduce to the core outside of the flow controller,
> with the idea that existing flow controllers would not need to be modified.
> That has a certain appeal, but I would argue that people who want to have
> more control over the flow and parametrize it should switch to a new flow
> controller which offers that functionality. It may be more work for the
> adopters of this new controller, but it would not weaken the overall
> architecture by supplementing existing extension points with alternative
> and invasive mechanisms.
>
> -- Richard
>
>
> On 27.09.2013, at 21:54, Marshall Schor <[email protected]> wrote:
>
>> A modified proposal, with rationale from the previous discussion.
>>
>> 1) add one attribute, not 2. Per Thilo's suggestion, make it the run= rather
>> than the skip=.
>> Current external override file syntax support simple negation computing that
>> is
>> somewhat readable, to allow changing just one value to pick an alternative.
>>
>> 2) Have the attribute associated with the delegate element, rather than with
>> the
>> flow constraint. Rationale: flow constraint is optional, and relates to
>> flow;
>> the meaning of this specification is more concerned with choosing an
>> alternative
>> or skipping, regardless of how many places the delegate might appear in a
>> flow.
>>
>> 3) Preserve a substantial amount of backwards compatibility. This includes
>> having previously written flow controllers continue to work unmodified. For
>> skipping, this means the delegate will still be entered into internal tables,
>> given to the flow controller, etc., but the dispatch of the CAS will be as
>> if it
>> was done with a no-op annotator; this applies also to initialization. This
>> allows flow controllers that had a hard-coded flow to continue to work. For
>> alternatives (using the same delegate "key"), have the picked one entered
>> into
>> the internal tables.
>>
>> 4) The CDE will need to have a way to (optionally) specify a External
>> Configuration File to use when dealing with a descriptor, and have a defined
>> strategy if no such file is available. It should store the path to the
>> specified file in a property-local setting for subsequent use in Eclipse.
>>
>> 5) UIMA-AS, if managing an aggregate as an asynchronous aggregate, may need
>> to
>> recognize skipped delegates.
>>
>> 6) External Resource Definitions... for skipped / alternative items: These
>> might
>> or might not be referred to by other delegates. These are handled for
>> backwards
>> compability: for alternatives (sharing the same key), only the one which is
>> picked is included. For skipping: the resource omitted only if there are no
>> references to it from non-skipped things. Otherwise it is included, even
>> though
>> its associated annotator is skipped.
>>
>> I've probably forgotten some things...
>>
>> -Marshall