Hi Ryan, You made some excellent progress. ctakes is a little complicated for new users - especially anybody that isn't familiar with Java.
Since you are going to be running from a command line (via python) and have already done so successfully, we can just try to get you set up to repeat that process. In Eclipse, you should be able to run the maven "package" configuration. That will compile and build an installation similar to what you were using before. After you execute maven package, open the directory ctakes-distribution/target/ There should be a .zip file named apache-ctakes-4.0.1-SNAPSHOT-bin That zip file contains a ctakes installation for Windows. Unzip the installation wherever you like - preferably without spaces in directory names. You should be able to treat this new installation just like you did the one downloaded from the ctakes website. Before you do all of that ... We should change a couple of things in that SentenceFirstCuiWriter to output blanks where procedures or cuis are not discovered for your snippets. >> public class SentenceFirstCuiWriter extends AbstractJCasFileWriter { >> >> public void writeFile( final JCas jCas, final String outputDir, >> final String documentId, final String fileName >> ) throws IOException { >> File cuiFile = new File( outputDir, fileName + "_cui.txt" ); >> Map<Sentence, Collection<ProcedureMention>> sentenceMap >> = JCasUtil.indexCovered( jCas, Sentence.class, >> ProcedureMention.class ); >> List<Collection<ProcedureMention>> sortedSentenceProcedures >> = sentenceMap.entrySet() >> .stream() >> .sorted( Map.Entry.comparingByKey( >> DefaultAspanComparator.INSTANCE ) ) >> .map( Map.Entry::getValue ) >> .collect( Collectors.toList() ); >> try ( Writer writer = new BufferedWriter( new FileWriter( cuiFile ) >> ) ) { >> for ( Collection<ProcedureMention> procedures : >> sortedSentenceProcedures ) { >> ProcedureMention firstProcedure >> = procedures.stream() >> .min( Comparator.comparingInt( >> ProcedureMention::getBegin ) ) >> .orElse( null ); >> if ( firstProcedure != null ) { ---------- Change the above line to if ( firstProcedure == null ) { writer.write( "\n" ); } else { >> String cui >> = OntologyConceptUtil.getCuis( firstProcedure ) >> .stream() >> .findFirst() >> .orElse( "" ); >> if ( !cui.isEmpty() ) { --------- Change the above line to if ( cuis.isEmpty() ) { writer.write( "\n" ); } else { >> writer.write( cui + "\n" ); >> } >> } >> } >> } >> } >> } So, after 1. Editing the SentenceFirstCuiWriter 2. Running the maven package step 3. Unzipping your ctakes installation You should be able to 1. Run ctakes from command line like you did before 2. Use the custom piper file 3. Resolve the firstly-discovered procedure for a snippet on each line 4. Write file(s) with corresponding line-by-line cuis or empty lines where none are resolved Let me know if I missed anything. Sean ________________________________________ From: Ryan Young <royo...@buffalo.edu> Sent: Monday, March 30, 2020 9:44 PM To: dev@ctakes.apache.org Subject: Re: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code (CUI) [EXTERNAL] * External Email - Caution * Hello Sean, I have run into some difficulty actually running the script you wrote (SentenceFirstCuiWriter.java). I spent the last week doing the following: 1.) Installed cTAKES developer version using Eclipse IDE 2.) Added the appropriate import statements at the beginning of SentenceFirstCuiWriter.java 3.) Placed SentenceFirstCuiWriter.java in this directory: C:\Users\Ryan\eclipse-workspace\ctakes\ctakes-core\src\main\java\org\apache\ctakes\core\cc 4.) Successfully built and compiled cTAKES developer version 5.) Successfully run the test configurations which were already in cTAKES in Eclipse (Run --> Run As --> Maven test) My main question is how do I run the cTAKES developer version from command line without running Eclipse or Maven? I found a post you made last year ( https://urldefense.proofpoint.com/v2/url?u=http-3A__mail-2Darchives.apache.org_mod-5Fmbox_ctakes-2Ddev_201907.mbox_-253C1563805239741.31947-2540childrens.harvard.edu-253E&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=ilUJmT8axx_RhXR_47XCxeR_aqswpoVXkSF5HQAxASQ&s=dxIE3QRB6OI1CxljCVx7K9Lgih-ymSq-wou0LqCvkvk&e= ). You stated, *"You can put PipelineBuilder in any main(..) method and then start that main(..) from a command line just as you would any other java program. Just like any other java program, you need to have your $CLASSPATH set correctly and, for memory use, increase your maximum memory with -Xmx . These are VM options."* I think this is what I have to do. But, I am unsure of how to accomplish this exactly. What I have tried already is: 1.) Launch Command Prompt 2.) Change directory to where PipelineBuilder.java is located cd C:\Users\Ryan\eclipse-workspace\ctakes\ctakes-core\src\main\java\org\apache\ctakes\core\pipeline\PipelineBuilder.java 3.) Enter the following into Command Prompt java org.apache.ctakes.core.pipeline.PiperFileRunner -p C:\Users\Ryan\SkyDrive\Desktop\Piper_File.piper -i C:\Users\Ryan\SkyDrive\Desktop\Input_Folder --writeXmis C:\Users\Ryan\SkyDrive\Desktop\Output_Folder I receive the following error in Command Prompt: Error: Could not find or load main class org.apache.ctakes.core.pipeline.PiperFileRunner I am probably missing something. Just not sure what exactly. I'm not too familiar with Java. The documentation I have been reading hasn't been as helpful since cTAKES is a much more complex project than the simple examples they provide. Lastly, I am using Windows 10. Thank You, Ryan Young MD/MBA Candidate Jacobs School of Medicine & Biomedical Sciences On Mon, Mar 23, 2020 at 3:28 PM Ryan Young <royo...@buffalo.edu> wrote: > Hello Sean, > > Wow! This was a lot more than I was anticipating! Thank you very much! > > To answer your questions... > • I am using Windows 10 > • I have the Python script call a shell command to run a batch file. The > batch file just contains the following line: > "C:\cTAKES_4.0.0\bin\runPiperFile.bat" -p "C:\path\to\piper.piper" > • The Python script waits for the shell command to complete (i.e., when > cTAKES is finished processing) > • The Python script will then parse the output text files and then > continue on with the code > > Prior to calling cTAKES, the surgery list is in a Pandas dataframe. The > workaround I had created was to save each line of the surgery list column > in the dataframe to a different text file to make it easier for when I had > to parse the output cTAKES text file. As I had mentioned previously, I > would like to have just 1 input text file and 1 output text file (as long > as the output file can be easily parsed by Python). > > Regarding my coding background, I don't have much background in Java. > However, a few years ago, I had no knowledge of Python either, but I was > able to teach myself while in medical school. > > A few more questions for you... > 1.) Should I save the code you posted in the following location as a .jar > file? > C:\cTAKES_4.0.0\lib\SentenceFirstCuiWriter.jar > > 2.) Should I replace "add CuiLookupLister" with "add > SentenceFirstCuiWriter" in the piper file or do I need both? > > 3.) If the SentenceFirstCuiWriter is unable to find a valid CUI, will it > leave a blank, N/A, or NaN value? Having any of these values would > definitely help when I have Python parse the output text file. When I have > Python read the output text file, I would have it delete any dataframe rows > with NaN or N/A in the CUI column. > > Thank you very much for your assistance! > > Ryan Young > MD/MBA Candidate > Jacobs School of Medicine & Biomedical Sciences > > On Mon, Mar 23, 2020 at 1:01 PM Finan, Sean < > sean.fi...@childrens.harvard.edu> wrote: > >> Hi Ryan, >> >> Here is some code for a writer that will do what you want. >> To use it, get rid of those first two lines in the piper that I sent >> (set, reader). >> The default reader will work just fine, and it will allow you to process >> multiple surgery lists in on run. >> >> Then just add SentenceFirstCuiWriter to the end of your piper. >> >> Sean >> >> >> public class SentenceFirstCuiWriter extends AbstractJCasFileWriter { >> >> public void writeFile( final JCas jCas, final String outputDir, >> final String documentId, final String fileName >> ) throws IOException { >> File cuiFile = new File( outputDir, fileName + "_cui.txt" ); >> Map<Sentence, Collection<ProcedureMention>> sentenceMap >> = JCasUtil.indexCovered( jCas, Sentence.class, >> ProcedureMention.class ); >> List<Collection<ProcedureMention>> sortedSentenceProcedures >> = sentenceMap.entrySet() >> .stream() >> .sorted( Map.Entry.comparingByKey( >> DefaultAspanComparator.INSTANCE ) ) >> .map( Map.Entry::getValue ) >> .collect( Collectors.toList() ); >> try ( Writer writer = new BufferedWriter( new FileWriter( cuiFile ) >> ) ) { >> for ( Collection<ProcedureMention> procedures : >> sortedSentenceProcedures ) { >> ProcedureMention firstProcedure >> = procedures.stream() >> .min( Comparator.comparingInt( >> ProcedureMention::getBegin ) ) >> .orElse( null ); >> if ( firstProcedure != null ) { >> String cui >> = OntologyConceptUtil.getCuis( firstProcedure ) >> .stream() >> .findFirst() >> .orElse( "" ); >> if ( !cui.isEmpty() ) { >> writer.write( cui + "\n" ); >> } >> } >> } >> } >> } >> } >> >> ________________________________________ >> From: Ryan Young <royo...@buffalo.edu> >> Sent: Monday, March 23, 2020 11:02 AM >> To: dev@ctakes.apache.org >> Subject: Configure Fast Lookup Dictionary To Return Only 1 UMLS Code >> (CUI) [EXTERNAL] >> >> * External Email - Caution * >> >> >> Hello, >> >> I am a medical student who happened to come across cTAKES for a project I >> am working on. What I would like to do is take a list of surgeries in a >> text file and have cTAKES output what it determines to be the best UMLS >> code (CUI) for that particular line. >> >> Each line of the text file is independent of the others (i.e., each line >> should be read and analyzed separately). For example, here's my list of >> the >> surgeries (Surgery_List.txt): >> Colonoscopy with Polypectomy >> Esophagogastroduodenoscopy Colonoscopy >> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle >> aspiration >> >> When I run the piper file (see below), I get the following output: >> Colonoscopy with Polypectomy >> "Colonoscopy" >> Procedure >> C0009378 colonoscopy >> "Polypectomy" >> Procedure >> C0521210 Resection of polyp >> >> Esophagogastroduodenoscopy Colonoscopy >> "Esophagogastroduodenoscopy" >> Procedure >> C0079304 Esophagogastroduodenoscopy >> "Colonoscopy" >> Procedure >> C0009378 colonoscopy >> >> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine needle >> aspiration >> "Esophagogastroduodenoscopy" >> Procedure >> C0079304 Esophagogastroduodenoscopy >> "Endoscopic ultrasound" >> Procedure >> C0376443 Endoscopic Ultrasound >> "Endoscopic" >> Procedure >> C0014245 Endoscopy (procedure) >> "ultrasound" >> Procedure >> C0041618 Ultrasonography >> "Fine needle aspiration" >> Procedure >> C1510483 Fine needle aspiration biopsy >> "aspiration" >> Procedure >> C0349707 Aspiration-action >> >> Here's the piper file I have been using: >> reader org.apache.ctakes.core.cr.FileTreeReader >> InputDirectory="C:\path\to\input\folder" >> load DefaultTokenizerPipeline.piper >> >> SentenceModelFile=C:\cTAKES_4.0.0\desc\ctakes-core\desc\analysis_engine\SentenceDetectorAnnotatorBIO.xml >> add ContextDependentTokenizerAnnotator >> add org.apache.ctakes.necontexts.ContextAnnotator >> addDescription POSTagger >> load ChunkerSubPipe.piper >> set ctakes.umlsuser=my_username ctakes.umlspw=my_password >> add org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator >> >> DictionaryDescriptor=C:\cTAKES_4.0.0\desc\ctakes-dictionary-lookup-fast\desc\analysis_engine\UmlsLookupAnnotator.xml >> >> LookupXml=C:\cTAKES_4.0.0\resources\org\apache\ctakes\dictionary\lookup\fast\sno_rx_16ab.xml >> add property.plaintext.PropertyTextWriterFit >> OutputDirectory="C:\path\to\output\folder" >> >> The workaround I have developed is as follows... >> 1.) Save each line of Surgery_List.txt to separate text files >> 2.) Use a Python script to parse each individual text file to extract the >> first UMLS code (CUI) given in the text file >> >> The above method works fine when there's only 10 lines, but not so well >> when there's 40,000 lines in Surgery_List.txt. >> >> Ideally, I would like for Fast Dictionary Lookup to just return the top >> result for each line of Surgery_List.txt. For example, Output.txt would >> look just like this: >> C0009378 >> C0079304 >> C0079304 >> >> Just for reference here's how UMLS codes correspond between >> Surgery_List.txt and Output.txt: >> C0009378 --> Colonoscopy with Polypectomy >> C0079304 --> Esophagogastroduodenoscopy Colonoscopy >> C0079304 --> Esophagogastroduodenoscopy with Endoscopic ultrasound Fine >> needle aspiration >> >> Is there something I can add to the piper file to make this happen? >> >> Currently, I have the cTAKES user version installed, but I could install >> the developer version if need be. I would just need a little guidance on >> which Java script I would need to modify to get the desired results. >> >> Thank You, >> >> Ryan Young >> MD/MBA Candidate >> Jacobs School of Medicine & Biomedical Sciences >> >