> These are 3 tables in the database aiunstructured. -- Ok. It might be beneficial to combine them into a single table and just use 1 annotator, especially since that number of total rows (140k) is relatively small.
> I would try to use exclusionTags and minimum span in my piper file -- Sounds good. Sean I am not sure about what would be the size of large dictionary for the Ctakes. Currently Number of rows in above tables are:- concepts: 70,000 rows drug: 30,000 rows persons: 40,000 rows For the* RaxaDefaultJcasTermAnnotator *part It is similar to the org.apache.ctakes.dictionary.lookup2.ae.DefaultJCasTermAnnotator , I have only changed the value of _minimumLookupSpan (to 1) protected variable of AbstractJCasTermAnnotator class. I would try to use exclusionTags and minimum span in my piper file and analyse the result. If there is any better way to implement the above scenario using a *single dictionary instance*. please let me know. On Fri, Feb 22, 2019 at 6:52 PM Finan, Sean < sean.fi...@childrens.harvard.edu> wrote: > Hi Zakir, > > Thank you for the information. Just out of curiosity, why did you decide > to go with 3 xml files instead of 1? You can combine all of those specs > into 1 xml and a single instance of the dictionary lookup class will handle > it. > > Without a little more knowledge of the dictionaries that you reference I > still can't say much. If they are huge then that i obviously going to > impact the run time. > > I just realized that part of your problem detecting things like "P 90" is > most likely part of speech tagging. There is a parameter named > "exclusionTags" that prevents certain parts of speech such as Verb from > being used in lookup. When using the ctakes dictionary lookup you might > want to change your piper to something like: > > // Do not exclude words of any part of speech tag for dictionary lookup. > set exclusionTags="" > // Use span of 1 for dictionary lookup. > set minimumSpan=1 > // Set the path to the xml file containing information for dictionary > lookup configuration. > set LookupXml=org/apache/ctakes/dictionary/lookup/fast/tinyDictSpec.xml > // Annotate concepts based upon default algorithms. > add DefaultJCasTermAnnotator > > -- though again, you are using something named > RaxaDefaultJCasTermAnnotator , and I have no idea what that is. > > Sean > > ________________________________________ > From: Zakir Saifi <zakir.sa...@raxa.com> > Sent: Thursday, February 21, 2019 1:54 AM > To: dev@ctakes.apache.org > Subject: Re: Making Ctakes Faster after Changing default lookup span value > [EXTERNAL] > > Thanks Sean for early reply, > > Here are the content of file you are looking for > > *1. tinyDictSpec.xml* > > ============ > > <?xml version="1.0" encoding="UTF-8"?> > > <lookupSpecification> > <dictionaries> > <dictionary> > <name>LabAnnotatorTestDict</name> > > > <implementationName>org.apache.ctakes.dictionary.lookup2.dictionary.UmlsJdbcRareWordDictionary</implementationName> > <properties> > <property key="jdbcDriver" > value="com.mysql.jdbc.Driver"/> > <property key="jdbcUrl" > > value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&characterEncoding=utf8&useSSL=false&autoReconnect=true"/> > <property key="jdbcUser" value="root"/> > <property key="jdbcPass" value=""/> > <property key="umlsUrl" value=" > > https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e= > > > <property key="umlsVendor" value="NLM-6515182895"/> > <property key="umlsUser" value=""/> > <property key="umlsPass" value=""/> > <property key="rareWordTable" value="rareword"/> > </properties> > </dictionary> > </dictionaries> > > <conceptFactories> > <conceptFactory> > <name>LabAnnotatorTestConcepts</name> > > > <implementationName>org.apache.ctakes.dictionary.lookup2.concept.UmlsJdbcConceptFactory</implementationName> > <properties> > <property key="jdbcDriver" > value="com.mysql.jdbc.Driver"/> > <property key="jdbcUrl" > > value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&characterEncoding=utf8&useSSL=false&autoReconnect=true"/> > <property key="jdbcUser" value="root"/> > <property key="jdbcPass" value=""/> > <property key="umlsUrl" value=" > > https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e= > > > <property key="umlsVendor" value="NLM-6515182895"/> > <property key="umlsUser" value=""/> > <property key="umlsPass" value=""/> > <property key="tuiTable" value="tui"/> > </properties> > </conceptFactory> > </conceptFactories> > > > <dictionaryConceptPairs> > <dictionaryConceptPair> > <name>LabAnnotatorPair</name> > <dictionaryName>LabAnnotatorTestDict</dictionaryName> > > <conceptFactoryName>LabAnnotatorTestConcepts</conceptFactoryName> > </dictionaryConceptPair> > </dictionaryConceptPairs> > > <rareWordConsumer> > <name>Term Consumer</name> > > > <implementationName>org.apache.ctakes.dictionary.lookup2.consumer.DefaultTermConsumer</implementationName> > <properties> > <property key="codingScheme" value="custom"/> > </properties> > </rareWordConsumer> > > </lookupSpecification> > > =========== > > *2. drugConcept.xml* > <?xml version="1.0" encoding="UTF-8"?> > > <lookupSpecification> > <dictionaries> > <dictionary> > <name>LabAnnotatorTestDict</name> > > > <implementationName>org.apache.ctakes.raxactakes.dictionary.lookup2.dictionary.UmlsJdbcDrugTermsDictonary</implementationName> > <properties> > <property key="jdbcDriver" > value="com.mysql.jdbc.Driver"/> > <property key="jdbcUrl" > > value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&characterEncoding=utf8&useSSL=false&autoReconnect=true"/> > <property key="jdbcUser" value="root"/> > <property key="jdbcPass" value=""/> > <property key="umlsUrl" value=" > > https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e= > > > <property key="umlsVendor" value="NLM-6515182895"/> > <property key="umlsUser" value=""/> > <property key="umlsPass" value=""/> > <property key="rareWordTable" value="drug"/> > </properties> > </dictionary> > </dictionaries> > > <conceptFactories> > <conceptFactory> > <name>LabAnnotatorTestConcepts</name> > > > <implementationName>org.apache.ctakes.raxactakes.dictionary.lookup2.concept.UmlsJdbcDrugNameConceptFactory > </implementationName> > <properties> > <property key="jdbcDriver" > value="com.mysql.jdbc.Driver"/> > <property key="jdbcUrl" > > value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&characterEncoding=utf8&useSSL=false&autoReconnect=true"/> > <property key="jdbcUser" value="root"/> > <property key="jdbcPass" value=""/> > <property key="umlsUrl" value=" > > https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e= > > > <property key="umlsVendor" value="NLM-6515182895"/> > <property key="umlsUser" value=""/> > <property key="umlsPass" value=""/> > <property key="tuiTable" value="tui"/> > </properties> > </conceptFactory> > </conceptFactories> > > > <dictionaryConceptPairs> > <dictionaryConceptPair> > <name>LabAnnotatorPair</name> > <dictionaryName>LabAnnotatorTestDict</dictionaryName> > > <conceptFactoryName>LabAnnotatorTestConcepts</conceptFactoryName> > </dictionaryConceptPair> > </dictionaryConceptPairs> > > <rareWordConsumer> > <name>Term Consumer</name> > > > <implementationName>org.apache.ctakes.dictionary.lookup2.consumer.DefaultTermConsumer</implementationName> > <properties> > <property key="codingScheme" value="custom"/> > </properties> > </rareWordConsumer> > </lookupSpecification> > > *=======* > > *3. personName.xml* > > <?xml version="1.0" encoding="UTF-8"?> > <lookupSpecification> > <dictionaries> > <dictionary> > <name>LabAnnotatorTestDict</name> > > > <implementationName>org.apache.ctakes.raxactakes.dictionary.lookup2.dictionary.UmlsJdbcPersonDictionary</implementationName> > <properties> > <property key="jdbcDriver" > value="com.mysql.jdbc.Driver"/> > <property key="jdbcUrl" > > value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&characterEncoding=utf8&useSSL=false&autoReconnect=true"/> > <property key="jdbcUser" value="root"/> > <property key="jdbcPass" value=""/> > <property key="umlsUrl" value=" > > https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e= > > > <property key="umlsVendor" value="NLM-6515182895"/> > <property key="umlsUser" value=""/> > <property key="umlsPass" value=""/> > <property key="rareWordTable" value="person_name"/> > </properties> > </dictionary> > </dictionaries> > > <conceptFactories> > <conceptFactory> > <name>LabAnnotatorTestConcepts</name> > > > <implementationName>org.apache.ctakes.raxactakes.dictionary.lookup2.concept.UmlsJdbcPersonNameConceptFactory</implementationName> > <properties> > <property key="jdbcDriver" > value="com.mysql.jdbc.Driver"/> > <property key="jdbcUrl" > > value="jdbc:mysql://localhost:3306/aiunstructured?useUnicode=true&characterEncoding=utf8&useSSL=false&autoReconnect=true"/> > <property key="jdbcUser" value="root"/> > <property key="jdbcPass" value=""/> > <property key="umlsUrl" value=" > > https://urldefense.proofpoint.com/v2/url?u=https-3A__uts-2Dws.nlm.nih.gov_restful_isValidUMLSUser-2522_&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=LVlKRbHf4nhXumArgukf-Ro0PPMed9_iJ_1e27esxLU&s=3pBcffpoyUN14sXOCaAvmlbgCSbg2soYlI1MVjPYV_I&e= > > > <property key="umlsVendor" value="NLM-6515182895"/> > <property key="umlsUser" value=""/> > <property key="umlsPass" value=""/> > <property key="tuiTable" value="tui"/> > </properties> > </conceptFactory> > </conceptFactories> > > <dictionaryConceptPairs> > <dictionaryConceptPair> > <name>LabAnnotatorPair</name> > <dictionaryName>LabAnnotatorTestDict</dictionaryName> > > <conceptFactoryName>LabAnnotatorTestConcepts</conceptFactoryName> > </dictionaryConceptPair> > </dictionaryConceptPairs> > > <rareWordConsumer> > <name>Term Consumer</name> > > > <implementationName>org.apache.ctakes.dictionary.lookup2.consumer.DefaultTermConsumer</implementationName> > <properties> > <property key="codingScheme" value="custom"/> > </properties> > </rareWordConsumer> > </lookupSpecification> > > > *RaxaDefaultJcasTermAnnotator* is similar to the > org.apache.ctakes.dictionary.lookup2.ae.*DefaultJCasTermAnnotator* , I have > only changed the value of _minimumLookupSpan (to 1) variable > of AbstractJCasTermAnnotator. > > On Thu, Feb 21, 2019 at 11:41 AM Finan, Sean < > sean.fi...@childrens.harvard.edu> wrote: > > > Hi Zakir, > > > > In order for me to help you, I need to know more about: > > Your primary dictionary: > > LookupXml=org/apache/ctakes/dictionary/lookup/fast/tinyDictSpec.xml > > > > Your custom dictionary lookup #1: > > add > > org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator > > LookupXml=org/apache/ctakes/dictionary/lookup/fast/drugConcept.xml > > > > Your custom dictionary lookup #2: > > add > > org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator > > LookupXml=org/apache/ctakes/dictionary/lookup/fast/personName.xml > > > > > > As for your metrics, > > >For lookup span > > value of 3 (default), rest call was taking less than 2s for text like ( > > Systolic blood pressure 180 ) is now taking around 5s. > > > > Does this mean that a document containing such text took 2 seconds, or > > that averaging over discovered annotations per took 2 seconds? > > > > I realize that moving from 3 characters to 1 means that every "a" "to" > > "in" "of" "an" "1" "2" ... is used for lookup. However, that should not > > multiply the processing time *2.5 > > > > > > I have to wonder if the non-ctakes > > org.apache.ctakes.raxactakes.dictionary.lookup2.ae > > .RaxaDefaultJCasTermAnnotator > > is doing something suspect. > > > > > > Sean > > > > > > ________________________________________ > > From: Zakir Saifi <zakir.sa...@raxa.com> > > Sent: Thursday, February 21, 2019 12:18 AM > > To: dev@ctakes.apache.org > > Subject: Making Ctakes Faster after Changing default lookup span value > > [EXTERNAL] > > > > Hi Everyone, > > > > I am using Ctakes for Structuring some clinical Text. In my clinical > text, > > there are single characters word like *P 90 (Pulse 90) *etc. I want > Ctakes > > to detect those. Since the default minimum span detected by Ctakes is 3. > > I was not able to detect these concepts. Therefore I have changed the > Value > > of the _minimumLookupSpan to 1. Now I am able to detect the one character > > word using Ctakes after adding them to my Custom Dictionary. > > > > My Problem is that after changing the value of _minimumLookupSpan, ctakes > > has become slow. > > I am using Ctakes-web-Rest (Rest Service using Ctakes). For lookup span > > value of 3 (default), rest call was taking less than 2s for text like ( > > Systolic blood pressure 180 ) is now taking around 5s. > > > > How can I make Ctakes faster?. Any configuration which helps to improve > the > > performance without losing the current detection rate. > > > > Here is the content of my current Piper file. > > > > load DefaultFastPipeline > > add > > org.apache.ctakes.raxactakes.dictionary.lookup2.ae > > .RaxaDefaultJCasTermAnnotator > > LookupXml=org/apache/ctakes/dictionary/lookup/fast/tinyDictSpec.xml > > add LabValueFinder > > add > > org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator > > LookupXml=org/apache/ctakes/dictionary/lookup/fast/drugConcept.xml > > add org.apache.ctakes.drugner.ae.DrugMentionAnnotator > > > > > STATUS_BOUNDARY_ANN_TYPE="org.apache.ctakes.typesystem.type.textsem.MedicationMention" > > add > > org.apache.ctakes.raxactakes.dictionary.lookup2.ae.RaxaJCasTermAnnotator > > LookupXml=org/apache/ctakes/dictionary/lookup/fast/personName.xml > > add org.apache.ctakes.raxactakes.core.ae.PersonNameFinder > > > > addDescription EventAnnotator > > addLogged BackwardsTimeAnnotator > > classifierJarPath=/org/apache/ctakes/temporal/ae/timeannotator/model.jar > > addLogged DocTimeRelAnnotator > > classifierJarPath=/org/apache/ctakes/temporal/ae/doctimerel/model.jar > > addLogged EventTimeRelationAnnotator > > classifierJarPath=/org/apache/ctakes/temporal/ae/eventtime/model.jar > > addLogged EventEventRelationAnnotator > > classifierJarPath=/org/apache/ctakes/temporal/ae/eventevent/model.jar > > addLogged ContextualModalityAnnotator > > > > > classifierJarPath=/org/apache/ctakes/temporal/ae/contextualmodality/model.jar > > addLogged EventAnnotator > > classifierJarPath=/org/apache/ctakes/temporal/ae/eventannotator/model.jar > > > > -- > > Regards > > Zakir Saifi > > (Software Developer at Raxa) > > > > > -- > Regards > Zakir Saifi > (Software Developer at Raxa) > -- Regards Zakir Saifi (Software Developer at Raxa)