[
https://issues.apache.org/jira/browse/UIMA-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880467#comment-16880467
]
Marshall Schor commented on UIMA-6057:
--------------------------------------
Thanks, this helps a lot.
Here's what I see, please tell me if this is what you're intending:
# An Analysis Engine (AE1) is created from some descriptor, and initialized.
# Its process method is called, using a CAS from that AE1.
# An annotator in the AE1 pipeline creates another Analysis Engine (AE2),
using some other descriptor; it is initialized.
# The annotator then calls AE2 process method, passing the CAS associated with
AE1.
Assuming this is an accurate description of the intent, this use-case was never
contemplated by the UIMA designers, I think. There are mutliple issues, only
one of which is the one reported in this Jira.
Having said that, this is kind of use case (running a pipeline as a
"subroutine" of another pipeline) is a recurring use-case, and UIMA could be
extended to support that better.
Here are some of the issues with the above method in the current framework;
there may more (please add to the list if you know of others)
# UIMA's APIs are split into 2 kinds: the ones Annotators call (mostly
concerned with creating / fetching feature structure values from the CAS,
running iterators over indexes), and the Application APIs (concerned with
creating pipelines from descriptors, and running pipelines). In between these
APIs is much of the functionality of the UIMA framework, including things like
sequencing Annotators, integrating remote annotators, setting up shared
external resources, providing for configuration parameters, etc.
## When a CAS is not inside a pipeline, but is being referenced by the
Application APIs, the envisioned use is that the application API runs multiple
"documents" through the pipeline, and "resets" the CAS at the end of each run,
and sets it up for the next document. This design is because the CAS is a
rather heavyweight structure, taking time to create, but once set up, can be
"reset" quickly.
## When the CAS enters the pipeline, every time it temporary exits the UIMA
framework to enter a user's annotator code, a bit is set to "lock" the CAS to
block annotators from accidentally calling "reset" on the CAS while it is in
the pipeline. When the annotator finishes and returns to the framework, it is
unlocked.
## This new use case results in a "locked" CAS being set to AE2, and when that
pipeline exits, the CAS is returned to AE1 in an unlocked state. This is
probably a minor issue, of no import, as long as the Annotator doesn't
accidentally try to reset the CAS.
# When the framework calls an Annotator's process method, it uses information
from that Annotator's metadata to set up the "result specification" - a set of
what types and features ought to be produced. Since AE2's pipeline has a
completely independent type system specification, the result specification is
in terms of that type system. If that type system doesn't match AE1s exactly,
then the result specification won't match the type system of the CAS being sent
through the pipeline. Impact of this: varies, because most annotators do not
make any use of the result specification.
# The issue from this Jira - because the framework makes the assumption that
when a pipeline being run in a Pear context returns, the Pear context exits.
# JMX counting / logging of time vs annotators : the inner pipeline's time is
counted multiply - also against the outer AEs time.
# UIMA-AS (which provides for flexible remoting and replication of annotators)
might have other issues - don't know yet. Maybe [~cwiklik] can weigh in.
# If the user's initialize methods in the AE2's annotators or shared external
resources make use of the type system, they would be set up in the context of
AE2's produceAE, and subsequently run use AE1's type system, which may cause
other issues.
In your use case, is it always true that AE2's type system and index
specification always match exactly AE1's?
That's all for now. I'm not sure what the best way forward for this is... I'm
thinking more about it and other opinions are welcome.
> Avoid falsely switching classloader
> -----------------------------------
>
> Key: UIMA-6057
> URL: https://issues.apache.org/jira/browse/UIMA-6057
> Project: UIMA
> Issue Type: Bug
> Components: Core Java Framework
> Reporter: Matthias Koch
> Priority: Major
> Attachments: UIMA-6057.diff, classloadertest.zip
>
>
> In some cases the classloader is switched back, although it hasn't be
> switched before processing.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)