+1, I agree completely.  The AE descriptors should not contain
flow logic, that should go elsewhere.

--Thilo

On 09/27/2013 11:06 PM, Richard Eckart de Castilho wrote:
The XML descriptor for AEs is already serving two purposes, it describes
a component, and it configures it. I do not think it should be further
watered down by evolving into a DSL for workflows, in particular if that
means that workflow logic is leaking outside the flow part of the descriptor.
Combining your proposal with our other proposal regarding the embedding of an
expression language in the external configuration mechanism, is further watering
down the responsibility of the flow controller. What are currently clearly
assigned responsibilities and a clear design becomes quite a muddy patch.

I think it is a good idea to explore new and improved possibilities for flow
descriptions. But since once a thing is in the core, it is unlikely to ever
get out, I do not think research on how such a thing should be done should
happen in the core. In particular, as long as changes to the core are under
the heading of "must remain compatible" (cf. UIMA-2670, or other recent
discussions on this list), I suggest that experimental extensions should be
developed outside the core. There are already enough things in the core that
should be addressed and cruft that could be removed or fixed up and better
integrated.

Since the modification you suggest appears to be possibly only with
modifications to the core and since it ignores the encapsulation of
flow logic within the flow descriptor, I suggest to try and approach the
user requirements from a different angle which do not have these problems.

The motivation for the suggested changes appear to be related to using
UIMA setups for experimentation which requires extensive parametrization.
There are many people who already do that. E.g. two of the publications 
presented
during the UIMA workshop last Monday where highlighting different approaches of
building high-level experimental workflows for UIMA pipelines:

CSE Framework: A UIMA-based Distributed System for Configuration Space 
Exploration 14-17
Elmer Garduno, Zi Yang, Avner Maiberg, Collin McCormack, Yan Fang, Eric Nyberg
in http://ceur-ws.org/Vol-1038/

Bluima: a UIMA-based NLP Toolkit for Neuroscience 34-41
Renaud Richardet, Jean-Cédric Chappelier, Martin Telefont
in http://ceur-ws.org/Vol-1038/

and similarly, although from a different venue

A lightweight framework for reproducible parameter sweeping in information 
retrieval
Richard Eckart de Castilho, Iryna Gurevych, 
http://dl.acm.org/citation.cfm?id=2064248

There may be additional work on building experiments (with UIMA) that
should be considered, e.g. the original uimaFIT publications, since 
experimentation
was one of the main considerations behind its development, although "E" didn't 
make
it into the name.

The flow controller offers a clear extension point for the kind of modifications
you are trying to introduce to the core outside of the flow controller,
with the idea that existing flow controllers would not need to be modified.
That has a certain appeal, but I would argue that people who want to have
more control over the flow and parametrize it should switch to a new flow
controller which offers that functionality. It may be more work for the
adopters of this new controller, but it would not weaken the overall
architecture by supplementing existing extension points with alternative
and invasive mechanisms.

-- Richard


On 27.09.2013, at 21:54, Marshall Schor <[email protected]> wrote:

A modified proposal, with rationale from the previous discussion.

1) add one attribute, not 2.  Per Thilo's suggestion, make it the run= rather
than the skip=.
Current external override file syntax support simple negation computing that is
somewhat readable, to allow changing just one value to pick an alternative.

2) Have the attribute associated with the delegate element, rather than with the
flow constraint.  Rationale: flow constraint is optional, and relates to flow;
the meaning of this specification is more concerned with choosing an alternative
or skipping, regardless of how many places the delegate might appear in a flow.

3) Preserve a substantial amount of backwards compatibility. This includes
having previously written flow controllers continue to work unmodified.  For
skipping, this means the delegate will still be entered into internal tables,
given to the flow controller, etc., but the dispatch of the CAS will be as if it
was done with a no-op annotator; this applies also to initialization.  This
allows flow controllers that had a hard-coded flow to continue to work. For
alternatives (using the same delegate "key"), have the picked one entered into
the internal tables.

4) The CDE will need to have a way to (optionally) specify a External
Configuration File to use when dealing with a descriptor, and have a defined
strategy if no such file is available.  It should store the path to the
specified file in a property-local setting for subsequent use in Eclipse.

5) UIMA-AS, if managing an aggregate as an asynchronous aggregate, may need to
recognize skipped delegates.

6) External Resource Definitions... for skipped / alternative items: These might
or might not be referred to by other delegates.  These are handled for backwards
compability: for alternatives (sharing the same key), only the one which is
picked is included.  For skipping: the resource omitted only if there are no
references to it from non-skipped things.  Otherwise it is included, even though
its associated annotator is skipped.

I've probably forgotten some things...

-Marshall

Reply via email to