Re: [I] [Question]: Initial CTakes analysis [ctakes]

via GitHub Mon, 28 Apr 2025 10:22:36 -0700


Johnsd11 commented on issue #56:
URL: https://github.com/apache/ctakes/issues/56#issuecomment-2835960043


   Out of the box, cTakes would get you part of the way there, but would
   require several types of customization to meet your requirements.  All of
   these are the kind of customizations that most of us have had to do, so
   there's nothing new here, but they are not trivial.  As I see it they fall
   into these categories.
   
   1. getting familiar with the cTakes Application, pipeline, annotator and
   vocabulary ecosystem
   2. choosing a vocabulary subset that gives the best coverage of the terms
   you are looking for
   3. adding one or more custom dictionaries to add terms & synonyms that are
   not present -
   4. maybe employing the anatomical site annotator in your pipeline
   5. deciding how to harvest and structure the data you extract from the CAS
   object which all the annotators target
   6. decide how to deploy the application (standalone?,  webservices host?
   multi-instance? ).  Many considerations go into this and greatly affect
   ability to scale.  There is more than one architectural solution that will
   work and allow you to get to your "fully automated" goal, but you will need
   to implement that yourself.
   
   A hint about highlighting the text - all annotations carry text offsets so
   with these you can write code (usually JS and CSS) to do your
   highlighting.  native cTakes does not have any graphical display
   functionality.
   
   Another hint learned from experience.  If you have many large texts (say,
   20kb and above with lots of potential terms to discover), you can achieve
   much better throughput by breaking these into smaller chunks at sentence
   boundaries and tweaking offsets accordingly as you reassemble the chunks.
   The memory requirements grow rapidly with the size of the note.
   
   In summary, a strong developer background is a good starting point.  To
   that you'd want to add medical informatics, and experience with scalable
   architectures.  cTakes is a great kernel to your system but be prepared to
   dive deep.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Question]: Initial CTakes analysis [ctakes]

Reply via email to