Hi Sean,

I just wanted to follow up on this one more time with a few follow-up.

>From Peter's description of how he is using a pool of pipelines:

>  I started from scratch to create a pipeline pool that sizes itself
according to the memory that’s available.  Each instance contains the
complete pipeline including the Term Annotator and a re-settable JCas
object.   I don’t use any of the thread constructs in piper files - to not
confuse the issue.  All of this is accessed via a web service with a multi
threaded dispatcher (SparkJava).

and our experience with doing something similar, it seems that this does
not lead to crashing, at least not with the components in the Default
Clinical Pipeline. We did have crashes when we tried to access the same
pipeline from two threads, but that was expected. I just wanted to verify
that you have seen problems with this specific setup- that is, a pool of
pipelines, where each one is only accessed in a single thread
simultaneously. It sounds like you have from your prior messages, but I am
just trying make sure I have not confused something.

We are not that concerned with the initial startup cost of loading the
piper file multiple times. I am hesitant to use use the wrapped thread-safe
components because we are concerned with compute time and I suspect that
much of the time in our pipeline is spent in the DefaultJCasTermAnnotator
and the threads would just have to wait in line.

With respect to you message:

>  Anyway, if I am running on a cluster (etc.) then it is a completely
different ballgame.  When I do that I don't bother with the TS pipeline.

When you are running on a cluster do you just use multiple processes, one
pipeline per process?

Lastly, do you know what the piper file "threads" command actually enforces
(
https://github.com/apache/ctakes/blob/trunk/ctakes-clinical-pipeline-res/src/main/resources/org/apache/ctakes/clinical/pipeline/TsDefaultFastPipeline.piper#L4
)?

Thanks again for your help.
Jeff


On Sat, Mar 9, 2019 at 10:24 PM Finan, Sean <
[email protected]> wrote:

> Hi Jeff,
>
> >  I assumed the TS wrappers were so you could avoid creating multiple
> pipelines and just run one instance of the pipeline with a separate JCAS
> per thread.
>
> -- Your assumption was correct.  The idea is to have only a single copy of
> any resource in memory : Dictionary, ML models (which can get big), graphs,
> etc.  The other advantage is a single initialization.  It may not seem like
> a big deal, but I have inits that take > 2 minutes, and parallel inits
> don't work too well wrt disk thrashing.  So, if I'm just running a 5 minute
> test, I'd rather run 3 threads with a single init.  And on my laptop 3
> instances of a memory hog is not really an option.  Yes, thread copies, but
> it is still more friendly.   Anyway, if I am running on a cluster (etc.)
> then it is a completely different ballgame.  When I do that I don't bother
> with the TS pipeline.
>
>
> > Do you know if this is a problem for any of the annotators in the
> default clinical pipeline
>
> -- Oh yeah.  That is why I made the TS wrappers.   A good number of the
> default AEs are not thread safe.  And really, it only takes 1 to ruin your
> day.  Resources, static variables and collections, i/o ...  And really,
> some things are not ctakes per se, but 3rd party libraries that are used by
> several -standard- AEs.
>
>
> > I'd like to really understand thread-safe with respect to core cTAKES
> components
>
> -- I don't know if anybody has done a formal writeup or anything of the
> sort.  I set out to do a deep dive into the code and refactor for TS, but
> quickly changed my mind.  See mention of 3rd parties above, though that
> certainly wasn't everything.  It was easier to write the wrappers.  Plus, I
> could rubber stamp and quickly wrap any ae that I came across for testing
> or use to be ts.
>
>
> Cheers for the curiosity,
>
> Sean
>
>
>
>
> ________________________________________
> From: Jeffrey Miller <[email protected]>
> Sent: Saturday, March 9, 2019 12:20 PM
> To: [email protected]
> Subject: Re: ctake web service [EXTERNAL]
>
> Thanks for your response Sean- we are still working on this (and have some
> things to look into given your last response), but I will share details
> when we have it working. We are still deciding on whether to use Spark or
> Apache Beam.
>
> Just to clarify my previous confusion, I assumed the TS wrappers were so
> you could avoid creating multiple pipelines and just run one instance of
> the pipeline with a separate JCAS per thread. I thought the main motivation
> behind that would be to avoid loading >1 dictionaries into memory, for
> example. But it sounds like I was mistaken. With respect to sharing
> resources, are static variables the main concern? Do you know if this is a
> problem for any of the annotators in the default clinical pipeline (the
> regular components, not the thread safe ones)? From Peter's response (I am
> not sure if that split off into another forum thread because the subject
> changed), it sounds like it may not be a problem? I'd like to really
> understand thread-safe with respect to core cTAKES components (with the
> caveat that community-created annotators could be implemented in any number
> of ways, making it hard to declare cTAKES is "thread-safe"). I'd be happy
> to contribute documentation back to the wiki once I feel I have a solid
> grasp on it.
>
> Peter- have you made your pipeline pool code available anywhere?
>
> On Fri, Mar 8, 2019 at 12:49 PM Finan, Sean <
> [email protected]> wrote:
>
> > Hi all,
> >
> > >Is there any known reason that you can't create a pipeline pool, but
> keep
> > everything in the same process?
> > -- No, but ...
> > > Is it safe to load multiple pipelines in
> > the same process as long as only one thread can access each one at a time
> > (we plan to use this in a Spark pipeline).
> > -- If you are talking about oob ctakes being the process, only a single
> > pipeline will run on multiple threads.  The threads will share resources,
> > static variables, etc. and the  pipeline will give you terrible results
> and
> > very quickly crash.  That is why I wrote the thread-safe wrappers.
> > -- That being said, supposedly you can configure spark to handle this by
> > keeping everything contained in a unique copy per thread.  Sort of like
> > ThreadLocal (I think), but more effective on a full-pipeline level.
> >
> > > it must have reduced the DefaultJCasTermAnnotator to a singleton object
> > in memory.
> > -- Yes.  The thread-safe pipeline is not meant to have siblings in the
> > same process - the wrappers can only do so much.  That being said, I am
> > pretty sure that the Default... is thread-safe so it doesn't actually
> need
> > the wrapper.  Regardless, the rest of the pipeline would crash.
> >
> > Jeff, can you share information about your efforts on spark?  If we could
> > get that working and in standard ctakes it would be fantastic.
> >
> > I hope that this information is useful.
> >
> > Sean
> >
> >
> >
> > ________________________________________
> > From: Jeffrey Miller <[email protected]>
> > Sent: Friday, March 8, 2019 11:23 AM
> > To: [email protected]
> > Subject: Re: ctake web service [EXTERNAL]
> >
> > Is there any known reason that you can't create a pipeline pool, but keep
> > everything in the same process? Is it safe to load multiple pipelines in
> > the same process as long as only one thread can access each one at a time
> > (we plan to use this in a Spark pipeline). One caveat I have noticed- it
> > seems like if I use the thread safe components to build a pipeline pool,
> > only one dictionary for the DefaultJCasTermAnnotator can be loaded per
> > process. For example, I was trying to take advantage of the ability to
> > switch pipelines via a query parameter that is suggested at in the code
> for
> > the rest service. The two pipelines used different ontology dictionaries,
> > but it seemed like with the thread safe components it must have reduced
> > the DefaultJCasTermAnnotator to a singleton object in memory, because it
> > only used the first dictionary instantiated. Either way, given how Sean
> > described how the thread safe components worked above, you probably
> > wouldn't want to use them in a pipeline pool, assuming that the problems
> > with threading was limited to multiple threads access the same pipeline
> at
> > the same time, and not having multiple pipelines loaded into memory each
> > accessed by only a single thread.
> >
> > On Fri, Mar 8, 2019 at 11:06 AM Kathy Ferro <[email protected]>
> > wrote:
> >
> > > I thought about creating a queue that acts as traffic cop.  Only the
> > > traffic cop calls the WS.  I also want to test multiple WS running on
> > > different port.  Traffic cop calls which every WS is available and keep
> > > track of WS statuses.  With all this processing going, it might kill
> the
> > > power for blocks.
> > >
> > > On Fri, Mar 8, 2019 at 10:34 AM Finan, Sean <
> > > [email protected]> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I guess that a quick test could be run with a multi-threaded
> pipeline.
> > > > Tim, for some reason I recall you checking in one with a dockerfile.
> > > Maybe
> > > > not, and it might not be the default in the service.  Anyway, you
> could
> > > set
> > > > the procs to something like 50 and throw 50 users at it.  It
> definitely
> > > > does not scale anything close to linearly.  ctakes aes aren't build
> for
> > > > thread-safety, so they are all wrapped with locks and there is a lot
> of
> > > > thread contention.  However, running such a test might indicate the
> > > source
> > > > of the problem.
> > > >
> > > > The other option is to create a queue that collects post calls and
> > doles
> > > > them out serially to a single pipeline.  User #50 would probably not
> > > > appreciate it though ...
> > > > ________________________________________
> > > > From: gandhi rajan <[email protected]>
> > > > Sent: Friday, March 8, 2019 10:02 AM
> > > > To: [email protected]
> > > > Subject: Re: ctake web service [EXTERNAL]
> > > >
> > > > Hi Kathy,
> > > >
> > > > I guess the initializations happens in post construct method. So if
> we
> > > > could synchronize that I feel we can get away from the problem.
> > > > Unfortunately I m not able to tet this as my setup is gone with my
> old
> > > job.
> > > > Try it out.
> > > >
> > > > Regards,
> > > > Gandhi.
> > > >
> > > > On Friday, March 8, 2019, Kathy Ferro <[email protected]>
> > wrote:
> > > >
> > > > > Tim,
> > > > >
> > > > > Thanks for reply.  I'm continuing the research.  With all the
> layers
> > > that
> > > > > wrap around this, you would think we can handle this suggestion.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Mar 7, 2019 at 8:01 PM Miller, Timothy <
> > > > > [email protected]> wrote:
> > > > >
> > > > > > That's a good question that I've also heard from others, and
> > > > > unfortunately
> > > > > > I don't know the answer. My use cases are typically a single job
> > at a
> > > > > time
> > > > > > making sequential calls, so I wasn't stressing it with multiple
> > > > > > asynchronous calls. I would've thought that the Tomcat container
> > > would
> > > > > have
> > > > > > some ability to manage that though!
> > > > > > Tim
> > > > > >
> > > > > > ________________________________________
> > > > > > From: Kathy Ferro <[email protected]>
> > > > > > Sent: Thursday, March 7, 2019 6:10 PM
> > > > > > To: [email protected]
> > > > > > Subject: Re: ctake web service [EXTERNAL]
> > > > > >
> > > > > > Tim,
> > > > > >
> > > > > > Does docker solution handle multiple instances?  I tested the
> Rest
> > > Web
> > > > > > Service with 2 requests at the same time, it errors out.  I
> removed
> > > the
> > > > > > part that write the result xml file to the disc; it still error
> > out.
> > > > > >
> > > > > > Best,
> > > > > > Kathy
> > > > > >
> > > > > > On Mon, Mar 4, 2019 at 10:52 AM Miller, Timothy <
> > > > > > [email protected]> wrote:
> > > > > >
> > > > > > > I don't know what the solution was, but I leave my ctakes REST
> > > server
> > > > > > > running basically full time and haven't seen time outs yet.
> > > > > > > Tim
> > > > > > >
> > > > > > > ________________________________________
> > > > > > > From: gandhi rajan <[email protected]>
> > > > > > > Sent: Monday, March 4, 2019 10:43 AM
> > > > > > > To: [email protected]
> > > > > > > Subject: Re: ctake web service [EXTERNAL]
> > > > > > >
> > > > > > > Hi Kathy, Sean did respond that there is no timeout happening
> > from
> > > > > cTAKES
> > > > > > > end. You might probably have to look at database settings for
> > this
> > > > > closed
> > > > > > > connection issue.
> > > > > > >
> > > > > > > Does someone have any clue on this?
> > > > > > >
> > > > > > > On Monday, March 4, 2019, Kathy Ferro <
> [email protected]>
> > > > > wrote:
> > > > > > >
> > > > > > > > Gandhi,
> > > > > > > >
> > > > > > > > Do you get any response to this issue?  Does it try to keep
> the
> > > > > > > connection
> > > > > > > > open while WS is up? Or does it open and close after it's
> done?
> > > > > > > >
> > > > > > > > We are still getting this error.
> > > > > > > > "ERROR JdbcRareWordDictionary - No operations allowed after
> > > > statement
> > > > > > > > closed."
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > > Kathy
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Fri, Aug 17, 2018 at 9:43 AM Gandhi Rajan Natarajan <
> > > > > > > > [email protected]> wrote:
> > > > > > > >
> > > > > > > > > Hi Kathy,
> > > > > > > > >
> > > > > > > > > Sometime back we encountered this issue and the problem
> seems
> > > to
> > > > be
> > > > > > DB
> > > > > > > > > connections getting timed out.
> > > > > > > > >
> > > > > > > > > Currently we are using the following implementations:
> > > > > > > > >
> > > > > > > "org.apache.ctakes.dictionary.lookup2.dictionary.
> > > > > JdbcRareWordDictionary"
> > > > > > > > > and "org.apache.ctakes.dictionary.lookup2.concept.
> > > > > JdbcConceptFactory"
> > > > > > > > >
> > > > > > > > > Does anybody aware of any timeout settings that needs to be
> > > done
> > > > in
> > > > > > > these
> > > > > > > > > implementations to avoid DB connection timeout issue?
> > > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Kathy Ferro <[email protected]>
> > > > > > > > > Sent: Thursday, August 16, 2018 11:07 PM
> > > > > > > > > To: [email protected]
> > > > > > > > > Subject: ctake web service
> > > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > Just want to see if anybody has experience this issue.
> > > > > > > > >
> > > > > > > > > If the web service had been up for a day or two, it will
> drop
> > > the
> > > > > > > > > dictionary lookup.  The only result it returns are
> > > > > > ConllDependencyNode
> > > > > > > > tag
> > > > > > > > > in the xmi file;  no mention, no concept, etc...
> > > > > > > > >
> > > > > > > > > I haven't have a chance to investigate it, yet.
> > > > > > > > >
> > > > > > > > > Kathy
> > > > > > > > > This email and any files transmitted with it are
> confidential
> > > and
> > > > > > > > intended
> > > > > > > > > solely for the use of the individual or entity to whom they
> > are
> > > > > > > > addressed.
> > > > > > > > > If you are not the named addressee you should not
> > disseminate,
> > > > > > > distribute
> > > > > > > > > or copy this e-mail. Please notify the sender or system
> > manager
> > > > by
> > > > > > > email
> > > > > > > > > immediately if you have received this e-mail by mistake and
> > > > delete
> > > > > > this
> > > > > > > > > e-mail from your system. If you are not the intended
> > recipient
> > > > you
> > > > > > are
> > > > > > > > > notified that disclosing, copying, distributing or taking
> any
> > > > > action
> > > > > > in
> > > > > > > > > reliance on the contents of this information is strictly
> > > > prohibited
> > > > > > and
> > > > > > > > > against the law.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Regards,
> > > > > > > Gandhi
> > > > > > >
> > > > > > > "The best way to find urself is to lose urself in the service
> of
> > > > others
> > > > > > > !!!"
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > Gandhi
> > > >
> > > > "The best way to find urself is to lose urself in the service of
> others
> > > > !!!"
> > > >
> > >
> >
>

Reply via email to