Thanq Sean, can we have any LinesFromFileCollectionReader example please share me,
regards, shyam k. On Fri, Jan 13, 2017 at 8:19 PM, Finan, Sean < sean.fi...@childrens.harvard.edu> wrote: > Hi Shyam, > > I'm not sure what the [4] is doing in your nextLine String processing. > > That aside, are you seeing the pipeline being initiated multiple times? > This could be the problem. > > Your file reader looks nice, but as I advised in my last email, give > LinesFromFileCollectionReader a try. Instead of creating a new cas object > and initializing the pipeline once per line, this will allow ctakes to > reuse a single cas object and initialize the pipeline only once. > > Sean > > -----Original Message----- > From: Ks Sunder [mailto:shyam...@gmail.com] > Sent: Friday, January 13, 2017 1:11 AM > To: dev@ctakes.apache.org > Subject: Re: Allergy Annotator > > Thanq Sean, > > I have done coding for this read the csv file purpose im using java, > but cTakes UML Dictionary purpose I am using below fuction. > > > public AnalysisEngineDescription getUMLPipeline() throws > ResourceInitializationException, URISyntaxException{ > AggregateBuilder builder = new AggregateBuilder(); > builder.add(SimpleSegmentAnnotator.createAnnotatorDescription()); > builder.add(SentenceDetector.createAnnotatorDescription()); > builder.add(TokenizerAnnotatorPTB.createAnnotatorDescription()); > builder.add(POSTagger.createAnnotatorDescription()); > builder.add(ClinicalPipelineFactory.getNpChunkerPipeline()); > builder.add(LvgAnnotator.createAnnotatorDescription()); > > try { > builder.add( AnalysisEngineFactory.createEngineDescription( > DefaultJCasTermAnnotator.class, > AbstractJCasTermAnnotator.PARAM_WINDOW_ANNOT_PRP, > "org.apache.ctakes.typesystem.type.textspan.Sentence", > JCasTermAnnotator.DICTIONARY_DESCRIPTOR_KEY, > ExternalResourceFactory.createExternalResourceDescription( > FileResourceImpl.class, > FileLocator.locateFile( > "org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml" > ) > ) > ) ); > } catch ( FileNotFoundException e ) { > e.printStackTrace(); > throw new ResourceInitializationException( e ); > } > > return builder.createAggregateDescription(); > } > > > and next I am calling this fuction from here...... > > > > reader = new CSVReader(new FileReader(ExelReadJava.NarrativeFile)); > String [] nextLine; > int lineNumber = 0; > > > while ((nextLine = reader.readNext()) != null) { > lineNumber++; > System.out.println("Line # " + lineNumber); > > //UML code start > try { > if(nextLine[4].length()>1 ){ > > final JCas jcas = JCasFactory.createJCas(); jcas.setDocumentText( > nextLine[4] ); SimplePipeline.runPipeline(jcas, pipelineTesting. > getUMLPipeline()); > > for ( IdentifiedAnnotation entity : JCasUtil.select( jcas, > IdentifiedAnnotation.class ) ) { > if(entity.getOntologyConceptArr() != null){ > > add.append(entity.getCoveredText()+ ","); > } > } > > > this function working properly , but processing time one line per 40sec, > how can decrease the processing time . > > i have 1lakh records(lines) in a csv file. > > please give me a solution and example...... > > > > > > regards, > shyam k. > > On Thu, Jan 12, 2017 at 8:48 PM, Finan, Sean < > sean.fi...@childrens.harvard.edu> wrote: > > > Hi Shyam, > > > > Have a look at the LinesFromFileCollectionReader class in ctakes-core. > > It doesn't use csv files, but instead treats every newline character > > as a separator. > > > > Sean > > > > -----Original Message----- > > From: Ks Sunder [mailto:shyam...@gmail.com] > > Sent: Wednesday, January 11, 2017 1:29 AM > > To: dev@ctakes.apache.org > > Subject: Re: Allergy Annotator > > > > Hi All, > > > > my scenario is, read the string content from csv file, and find out > > medical terms from that content using cTakes UML. > > > > as per your suggestion i try to find CollectionReader in ctakes-core, > > but i didnt get clear solution, please give valuable solution, and one > example. > > > > > > regards, > > shyam k. > > > > On Thu, Dec 22, 2016 at 9:16 PM, Finan, Sean < > > sean.fi...@childrens.harvard.edu> wrote: > > > > > Hi Shyam, > > > > > > I think that the key to your first question > > > > how can execute the single function to run all this jobs in > > > > short > > > time... > > > Is in your code here: > > > > > > 1 final JCas jcas = JCasFactory.createJCas(); > > > 2 jcas.setDocumentText( nextLine[0] ); > > > 3 SimplePipeline.runPipeline(jcas, getUMLPipeline()); > > > > > > What you probably want to do is replace lines #1 and #2 with a > > > CollectionReader, and then in #3 use a different SimplePipeline call > > > that runs the pipeline using the CollectionReader instead of a > > > static > > cas. > > > > > > There are commonly used CollectionReaders in ctakes-core. The most > > > widely applicable is probably the FileTreeReader*, which reads a > > > tree of ascii files. If you have some other source of text data > > > then look around the code for something that might fit and let the > > > devlist know if you can't find anything that fits your needs. > > > > > > I don't understand your second question: > > > > how can i find sentence vised Dictionary words from string, give > > > > me a > > > solution for this.. > > > Can you rephrase it and post to the devlist again? > > > > > > * one advantage that the FileTreeReader has is that it stores > > > metadata on the input file tree placement, which can then be > > > reproduced by output file writers like the html writer. > > > > > > Sean > > > > > > > > > -----Original Message----- > > > From: Ks Sunder [mailto:shyam...@gmail.com] > > > Sent: Thursday, December 22, 2016 2:33 AM > > > To: dev@ctakes.apache.org > > > Subject: Re: Allergy Annotator > > > > > > Hi All, > > > > > > I have done the below code for finding medical terms from String > > > information. > > > > > > step 1 : > > > public static AnalysisEngineDescription getUMLPipeline() throws > > > ResourceInitializationException, URISyntaxException{ > > > AggregateBuilder builder = new AggregateBuilder(); > > > builder.add(SimpleSegmentAnnotator.createAnnotatorDescription()); > > > builder.add(SentenceDetector.createAnnotatorDescription()); > > > builder.add(TokenizerAnnotatorPTB.createAnnotatorDescription()); > > > builder.add(POSTagger.createAnnotatorDescription()); > > > builder.add(ClinicalPipelineFactory.getNpChunkerPipeline()); > > > builder.add(LvgAnnotator.createAnnotatorDescription()); > > > > > > try { > > > builder.add( AnalysisEngineFactory.createEngineDescription( > > > DefaultJCasTermAnnotator.class, > > > AbstractJCasTermAnnotator.PARAM_WINDOW_ANNOT_PRP, > > > "org.apache.ctakes.typesystem.type.textspan.Sentence", > > > JCasTermAnnotator.DICTIONARY_DESCRIPTOR_KEY, > > > ExternalResourceFactory.createExternalResourceDescript > ion( > > > FileResourceImpl.class, > > > FileLocator.locateFile( > > "org/apache/ctakes/dictionary/lookup/fast/cTakesHsql.xml" > > > ) ) > > > ) ); > > > } catch ( FileNotFoundException e ) { > > > e.printStackTrace(); > > > throw new ResourceInitializationException( e ); > > > } > > > > > > return builder.createAggregateDescription(); > > > } > > > step 2: > > > > > > final JCas jcas = JCasFactory.createJCas(); jcas.setDocumentText( > > > nextLine[0] ); SimplePipeline.runPipeline(jcas, getUMLPipeline()); > > > > > > for ( IdentifiedAnnotation entity : JCasUtil.select( jcas, > > > IdentifiedAnnotation.class ) ) { > > > > > > if(entity.getOntologyConceptArr() != null){ > > > > > > add.append(entity.getCoveredText()+ ","); > > > > > > } > > > } > > > > > > > > > > > > > > > > > > its working Fine.. > > > > > > But i have two quires.. > > > > > > 1. step1 , i am using Annotator step by step ... that time its > > > taking more time load the all fuctions > > > how can execute the single function to run all this jobs in short > > > time... > > > > > > 2. how can i find sentence vised Dictionary words from string, give > > > me a solution for this.. > > > > > > > > > ...please give me a solutions for this issues.... > > > > > > > > > > > > regards, > > > shyam k. > > > > > > On Thu, Dec 8, 2016 at 1:59 AM, Mullane, Sean *HS < > > > sp...@hscmail.mcc.virginia.edu> wrote: > > > > > > > I'm reviving this thread with reference to negation detection. I > > > > previously posted about this to the User list but this is probably > > > > a more appropriate venue. > > > > > > > > The way the sentences are split on ":" makes the negation > > > > annotator miss negation in lists of this form: > > > > > > > > Hyperlipidemia: Yes > > > > Hypercholesterolemia: No > > > > Chronic Renal Insufficiency: N/A > > > > > > > > I tried reversing order and removing ":"s and found that the > > > > negation for Hypercholesterolemia is detected when in this form: > > > > > > > > Yes Hyperlipidemia > > > > No Hypercholesterolemia > > > > N/A Chronic Renal Insufficiency > > > > > > > > Our notes have quite a few places with this sort of list where > > > > good negation detection is important but I haven't very good > > > > results. The sentence segmentator sees this as 12 separate > > > > sentences, but I would think proper behavior would be to consider > > > > this as 6 sentences (breaking sentences on line break but not on > > > > colons). I see previous discussion on the list about the sentence > > > > segmentator breaking on newlines but little regarding colons. I > > > > would think in most cases it would be more useful not to break on > > > > ":". Or is there an overriding > > > reason for the current behavior? > > > > If changing the sentence segmentator isn't an option is there a > > > > different way to configure the negation detection annotator that > > > > would avoid this issue? > > > > > > > > Thanks, > > > > Sean > > > > > > > > > > > > > > > > Hi, > > > > > > > > I am interested in the design decision of the sentence detector. > > > > > > > > Why does it split a sentence of the form "WORD1: WORD2 WORD3." > > > > into two sentences "WORD1:" and "WORD2 WORD3."? Do other > > > > components of cTAKES require such a sentence splitting? > > > > > > > > It would seem to me that it should remain one sentence. For > > > > example, the smoking status detector has its own SentenceAdjuster > > > > that merges some of such sentences back into one, because of this > design. > > > > > > > > Thanks, Tomasz > > > > > > > > ________________________________________ From: Finan, Sean [ > > > > sean...@childrens.harvard.edu] Sent: Friday, July 10, 2015 3:20 PM > To: > > > > de...@ctakes.apache.org Subject: RE: Allergy Annotator > > > > > > > > Hi Tom, > > > > > > > > It is exactly because the sentence detector splits "KEY:" from > "VALUE" > > > > that I > > > > didn't suggest using sentences. Instead, I would just iterate over > > > > the whole cas collection of medication events and attempt to match > > > > allergy phrases ("allergic to medication") with text the note > > > > spanning from > > > > event.begin-15 to > > > > event.end+15 or whatever window size you prefer. > > > > > > > > Sean > > > > > > > > -----Original Message----- From: Tom Devel > > > > [mailto:deve...@gmail.com] > > > > Sent: Friday, July 10, 2015 4:12 PM To: de...@ctakes.apache.org > > Subject: > > > > Re: Allergy Annotator > > > > > > > > Sean and Dima, these are great suggestions, thanks so far. > > > > > > > > Sean, when looping over medication events as you say, I can see > > > > how it is possible to take the textspan.Sentence of this > > > > MedicationMention, and then do a regex check for the phrase > > > > structure > > as Dima said. > > > > > > > > But instead of textspan.Sentence, you mention "see any is included > > > > in a phrase". > > > > What cTAKES/UIMA class is related to this? > > > > > > > > Because if I would use textspan.Sentence, it would work for "The > > > > patient is allergic to penicillin.", but cTAKES splits "ALLERGIES: > > > PENICILLIN, WHEAT" > > > > into two sentences, so that the MedicationMentions here would not > > > > be in the same sentence as the word "ALLERGIES". > > > > > > > > Thanks again, Tom > > > > > > > > On Fri, Jul 10, 2015 at 2:12 PM, Finan, Sean < > > > > sean...@childrens.harvard.edu> > > > > wrote: > > > > > > > > Hi Dima, Tom, > > > > > > > > I was thinking the same as Dima's first solution. Iterate through > > > > the medication events and see any is included in a phrase as > > > > mentioned in Tom's original email. Each phrase structure would > > > > have to be specified beforehand. However, assigning appropriate > > > > CUIs would require having a lookup table for each medication > > > > allergy. I think that would be the simplest solution. > > > > > > > > Sean > > > > > > > > -----Original Message----- From: Dligach, Dmitriy [mailto: > > > > dmit...@childrens.harvard.edu] Sent: Friday, July 10, 2015 2:50 PM > To: > > > > cTAKES Developer list Subject: Re: Allergy Annotator > > > > > > > > Hi Tom, > > > > > > > > If the patters are pretty simple, you could just add a few rules > > > > on top of the cTAKES dictionary lookup output. Something of the > > > > kind "allergic to <medication>" or "allergies: <medication1>, > > > > <medication2>, <substance1>, ...". > > > > > > > > If these patterns are hard to express as rules, you should > > > > consider a machine learning based sequence labeling route (e.g. > > > > something similar to the cTAKES chunker). > > > > > > > > Dima > > > > > > > > -- Dmitriy (Dima) Dligach, Ph.D. Boston Children's Hospital and > > > > Harvard Medical School (617) 651-0397 > > > > > > > > On Jul 10, 2015, at 13:40, Tom Devel <deve...@gmail.com<mailto: > > > > deve...@gmail.com>> wrote: > > > > > > > > Sean, > > > > > > > > It would be a wider net, such that if an allergy is mentioned in > > > > the clinical note, this is captured in the corresponding > > > > IdentifiedAnnotation (or alternatively, if the > > > > IdentifiedAnnotation class should not be changed with a new > > > > attribute, in a separate allergy annotation). > > > > > > > > This annotator would then have to of course run after the clinical > > > > pipeline has run and discovered all IdentifiedAnnotations. > > > > > > > > I am familiar with writing UIMA/cTAKES annotators, but not sure > > > > how a new ML method could be integrated here for detecting > > > > allergies. Do you have any thoughts about how to approach this in > general? > > > > > > > > Thanks, Tom > > > > > > > > On Fri, Jul 10, 2015 at 11:54 AM, Finan, Sean < > > > > sean...@childrens.harvard.edu<mailto:Sean.Finan@childrens.harvard. > > > > e > > > > du>> > > > > wrote: > > > > > > > > Hi Tom, > > > > > > > > Are you interested in catching all allergies or just a few > > > > specific allergies for a study? If you are only concerned with a > > > > few then there is a > > > > (possibly) simple solution. If you are interested in throwing a > > > > wider net then I think that a new module would need to be created; > > > > does anybody reading this have an ML or regex style module? > > > > > > > > Sean > > > > > > > > -----Original Message----- From: Tom Devel > > > > [mailto:deve...@gmail.com] > > > > Sent: Friday, July 10, 2015 12:42 PM To: de...@ctakes.apache.org< > > mailto: > > > > de...@ctakes.apache.org> Subject: Allergy Annotator > > > > > > > > Hi, > > > > > > > > I would like to use/extend cTAKES to detect allergies. > > > > > > > > In the cTAKES publication (2010) > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.ncbi.nlm.n > > > > ih > > > > .g > > > > ov_pmc_articles_PMC2995668_&d=BQIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW1 > > > > 4J > > > > ZM > > > > SdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=ZApJm > > > > GK > > > > jz > > > > vFfNco5rRFVwSIyxmg4MRsxakfuXHbMZME&s=mGWu0XBCJqG2MI5qPlwIpGbQL5IYe > > > > 7t 5E WcvhPYW7Lo&e= there is the mention that: "Allergies to a > > > > given medication are handled by setting the negation attribute of > > > > that medication to 'is negated'." > > > > > > > > However, in a post here in 2014 (RE: Allergy Indication) it is > > > > said that cTAKES does not have a module for allergy discovery. > > > > > > > > 1. What is the current status of allergy detection in cTAKES? > > > > > > > > 2. I did some testing, while cTAKES discovers concepts about > > > > allegies ("wheat allergy" is found as C0949570), using "ALLERGIES: > > > > PENICILLIN, WHEAT" or "The patient is allergic to penicillin." > > > > does not give penicillin or wheat annotations allergy status. > > > > > > > > How would I go about detecting these allergy mentions? > > > > > > > > Thanks, Tom > > > > > > > > > > > > > >