Re: Try 2: A proposal for a slight augmentation of aggregate component descriptors

Richard Eckart de Castilho Fri, 27 Sep 2013 14:07:20 -0700

The XML descriptor for AEs is already serving two purposes, it describes
a component, and it configures it. I do not think it should be further
watered down by evolving into a DSL for workflows, in particular if that
means that workflow logic is leaking outside the flow part of the descriptor.
Combining your proposal with our other proposal regarding the embedding of an
expression language in the external configuration mechanism, is further watering
down the responsibility of the flow controller. What are currently clearly 
assigned responsibilities and a clear design becomes quite a muddy patch.

I think it is a good idea to explore new and improved possibilities for flow
descriptions. But since once a thing is in the core, it is unlikely to ever
get out, I do not think research on how such a thing should be done should
happen in the core. In particular, as long as changes to the core are under
the heading of "must remain compatible" (cf. UIMA-2670, or other recent
discussions on this list), I suggest that experimental extensions should be
developed outside the core. There are already enough things in the core that
should be addressed and cruft that could be removed or fixed up and better
integrated. 

Since the modification you suggest appears to be possibly only with
modifications to the core and since it ignores the encapsulation of 
flow logic within the flow descriptor, I suggest to try and approach the
user requirements from a different angle which do not have these problems.

The motivation for the suggested changes appear to be related to using
UIMA setups for experimentation which requires extensive parametrization.
There are many people who already do that. E.g. two of the publications 
presented
during the UIMA workshop last Monday where highlighting different approaches of
building high-level experimental workflows for UIMA pipelines:

CSE Framework: A UIMA-based Distributed System for Configuration Space 
Exploration 14-17
Elmer Garduno, Zi Yang, Avner Maiberg, Collin McCormack, Yan Fang, Eric Nyberg
in http://ceur-ws.org/Vol-1038/

Bluima: a UIMA-based NLP Toolkit for Neuroscience 34-41
Renaud Richardet, Jean-Cédric Chappelier, Martin Telefont
in http://ceur-ws.org/Vol-1038/

and similarly, although from a different venue

A lightweight framework for reproducible parameter sweeping in information 
retrieval
Richard Eckart de Castilho, Iryna Gurevych, 
http://dl.acm.org/citation.cfm?id=2064248

There may be additional work on building experiments (with UIMA) that 
should be considered, e.g. the original uimaFIT publications, since 
experimentation
was one of the main considerations behind its development, although "E" didn't 
make 
it into the name.

The flow controller offers a clear extension point for the kind of 
modifications 
you are trying to introduce to the core outside of the flow controller, 
with the idea that existing flow controllers would not need to be modified. 
That has a certain appeal, but I would argue that people who want to have 
more control over the flow and parametrize it should switch to a new flow 
controller which offers that functionality. It may be more work for the 
adopters of this new controller, but it would not weaken the overall 
architecture by supplementing existing extension points with alternative 
and invasive mechanisms.

-- Richard

On 27.09.2013, at 21:54, Marshall Schor <[email protected]> wrote:

> A modified proposal, with rationale from the previous discussion.
> 
> 1) add one attribute, not 2.  Per Thilo's suggestion, make it the run= rather
> than the skip=.
> Current external override file syntax support simple negation computing that 
> is
> somewhat readable, to allow changing just one value to pick an alternative.
> 
> 2) Have the attribute associated with the delegate element, rather than with 
> the
> flow constraint.  Rationale: flow constraint is optional, and relates to flow;
> the meaning of this specification is more concerned with choosing an 
> alternative
> or skipping, regardless of how many places the delegate might appear in a 
> flow.
> 
> 3) Preserve a substantial amount of backwards compatibility. This includes
> having previously written flow controllers continue to work unmodified.  For
> skipping, this means the delegate will still be entered into internal tables,
> given to the flow controller, etc., but the dispatch of the CAS will be as if 
> it
> was done with a no-op annotator; this applies also to initialization.  This
> allows flow controllers that had a hard-coded flow to continue to work. For
> alternatives (using the same delegate "key"), have the picked one entered into
> the internal tables.
> 
> 4) The CDE will need to have a way to (optionally) specify a External
> Configuration File to use when dealing with a descriptor, and have a defined
> strategy if no such file is available.  It should store the path to the
> specified file in a property-local setting for subsequent use in Eclipse.
> 
> 5) UIMA-AS, if managing an aggregate as an asynchronous aggregate, may need to
> recognize skipped delegates.
> 
> 6) External Resource Definitions... for skipped / alternative items: These 
> might
> or might not be referred to by other delegates.  These are handled for 
> backwards
> compability: for alternatives (sharing the same key), only the one which is
> picked is included.  For skipping: the resource omitted only if there are no
> references to it from non-skipped things.  Otherwise it is included, even 
> though
> its associated annotator is skipped. 
> 
> I've probably forgotten some things... 
> 
> -Marshall

Re: Try 2: A proposal for a slight augmentation of aggregate component descriptors

Reply via email to