Hi Maite, > Does anyone know why is it [UmlsDictionaryLookupAnnotator ]so slow? The top 5 reasons (1-3 are 90% of the problem): 1. The dictionary database is bloated with unwanted entries 2. The dictionary database indexing is sub-optimal 3. The second drug lookup with orangebook filtering takes extra time 4. The matching algorithm does a little more work than is necessary 5. There is some redundancy
> my interest is to be able to create my own HsqlDb-based dictionary If you want to build a database using a subset of UMLS, check out the Dictionary Tool in Sandbox. It can build custom hsqldb dictionaries in both the new (-fast) and old format using sources, tuis, filters, etc. that you specify in plaintext parameter files. Several types of default setups are already available. It is fully functional, but it has been a work-in-progress during my off-hours, so functionality changes and documentation is lacking, but there is a howto.txt in the dictionarytool/doc/ directory. *NOTE: if your custom dictionaries are small (~1000 entries?) then it would probably be easier to just throw them into a bar-separated value (bsv) file. There are examples in the dictionary-fast-res example/bsv/ directory. Sean -----Original Message----- From: Maite Meseure Hugues [mailto:[email protected]] Sent: Tuesday, March 10, 2015 12:35 PM To: [email protected] Subject: Questions about dictionary-lookup and dictionary-lookup-fast Hi everyone, 1) I am currently working on BagOfCuisGenerator.java with the analysis engine 'AggregatePlaintextUMLSProcessor.xml', but that process is very slow at that step: INFO UmlsDictionaryLookupAnnotator - process(JCas) Does anyone know why is it so slow? 2) I also tried with 'AggregatePlaintextFastUMLSProcessor.xml' and it's actually pretty fast like his name suggests, but my interest is to be able to create my own HsqlDb-based dictionary like we can do with a Lucene index and integrate it in the process, is it possible with the fast version? Do you have any pointers that could allow me to do that? Thank you very much for you time. -- -- Maïté Meseure Hugues
