[jira] [Commented] (UIMA-6057) Avoid falsely switching classloader

Marshall Schor (JIRA) Mon, 08 Jul 2019 08:29:05 -0700


    [ 
https://issues.apache.org/jira/browse/UIMA-6057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880467#comment-16880467
 ]


Marshall Schor commented on UIMA-6057:
--------------------------------------

Thanks, this helps a lot.  
Here's what I see, please tell me if this is what you're intending:
 # An Analysis Engine (AE1) is created from some descriptor, and initialized.
 # Its process method is called, using a CAS from that AE1.
 # An annotator in the AE1 pipeline creates another Analysis Engine (AE2), 
using some other descriptor; it is initialized.
 # The annotator then calls AE2 process method, passing the CAS associated with 
AE1.

Assuming this is an accurate description of the intent, this use-case was never 
contemplated by the UIMA designers, I think.  There are mutliple issues, only 
one of which is the one reported in this Jira. 

Having said that, this is kind of use case (running a pipeline as a 
"subroutine" of another pipeline) is a recurring use-case, and UIMA could be 
extended to support that better.

Here are some of the issues with the above method in the current framework; 
there may more (please add to the list if you know of others)
 # UIMA's APIs are split into 2 kinds: the ones Annotators call (mostly 
concerned with creating / fetching feature structure values from the CAS, 
running iterators over indexes), and the Application APIs (concerned with 
creating pipelines from descriptors, and running pipelines).  In between these 
APIs is much of the functionality of the UIMA framework, including things like 
sequencing Annotators, integrating remote annotators, setting up shared 
external resources, providing for configuration parameters, etc.
 ## When a CAS is not inside a pipeline, but is being referenced by the 
Application APIs, the envisioned use is that the application API runs multiple 
"documents" through the pipeline, and "resets" the CAS at the end of each run, 
and sets it up for the next document.  This design is because the CAS is a 
rather heavyweight structure, taking time to create, but once set up, can be 
"reset" quickly.
 ## When the CAS enters the pipeline, every time it temporary exits the UIMA 
framework to enter a user's annotator code, a bit is set to "lock" the CAS to 
block annotators from accidentally calling "reset" on the CAS while it is in 
the pipeline.  When the annotator finishes and returns to the framework, it is 
unlocked.
 ## This new use case results in a "locked" CAS being set to AE2, and when that 
pipeline exits, the CAS is returned to AE1 in an unlocked state.  This is 
probably a minor issue, of no import, as long as the Annotator doesn't 
accidentally try to reset the CAS.
 # When the framework calls an Annotator's process method, it uses information 
from that Annotator's metadata to set up the "result specification" - a set of 
what types and features ought to be produced.  Since AE2's pipeline has a 
completely independent type system specification, the result specification is 
in terms of that type system.  If that type system doesn't match AE1s exactly, 
then the result specification won't match the type system of the CAS being sent 
through the pipeline.  Impact of this: varies, because most annotators do not 
make any use of the result specification.
 # The issue from this Jira - because the framework makes the assumption that 
when a pipeline being run in a Pear context returns, the Pear context exits.
 # JMX counting / logging of time vs annotators : the inner pipeline's time is 
counted multiply - also against the outer AEs time.
 # UIMA-AS (which provides for flexible remoting and replication of annotators) 
might have other issues - don't know yet.  Maybe [~cwiklik] can weigh in.
 # If the user's initialize methods in the AE2's annotators or shared external 
resources make use of the type system, they would be set up in the context of 
AE2's produceAE, and subsequently run use AE1's type system, which may cause 
other issues.

In your use case, is it always true that AE2's type system and index 
specification always match exactly AE1's?

That's all for now.  I'm not sure what the best way forward for this is...  I'm 
thinking more about it and other opinions are welcome.

> Avoid falsely switching classloader
> -----------------------------------
>
>                 Key: UIMA-6057
>                 URL: https://issues.apache.org/jira/browse/UIMA-6057
>             Project: UIMA
>          Issue Type: Bug
>          Components: Core Java Framework
>            Reporter: Matthias Koch
>            Priority: Major
>         Attachments: UIMA-6057.diff, classloadertest.zip
>
>
> In some cases the classloader is switched back, although it hasn't be 
> switched before processing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (UIMA-6057) Avoid falsely switching classloader

Reply via email to