Hi Maite,

> Does anyone know why is it [UmlsDictionaryLookupAnnotator ]so slow?
The top 5 reasons (1-3 are 90% of the problem):
1.  The dictionary database is bloated with unwanted entries
2.  The dictionary database indexing is sub-optimal
3.  The second drug lookup with orangebook filtering takes extra time
4.  The matching algorithm does a little more work than is necessary
5.  There is some redundancy

> my interest is to be able to create my own HsqlDb-based dictionary
If you want to build a database using a subset of UMLS, check out the 
Dictionary Tool in Sandbox.  It can build custom hsqldb dictionaries in both 
the new (-fast) and old format using sources, tuis, filters, etc. that you 
specify in plaintext parameter files.  Several types of default setups are 
already available.  It is fully functional, but it has been a work-in-progress 
during my off-hours, so functionality changes and documentation is lacking, but 
there is a howto.txt  in the dictionarytool/doc/ directory.

*NOTE: if your custom dictionaries are small (~1000 entries?) then it would 
probably be easier to just throw them into a bar-separated value (bsv) file.  
There are examples in the dictionary-fast-res example/bsv/ directory.  

Sean

-----Original Message-----
From: Maite Meseure Hugues [mailto:[email protected]] 
Sent: Tuesday, March 10, 2015 12:35 PM
To: [email protected]
Subject: Questions about dictionary-lookup and dictionary-lookup-fast

Hi everyone,

1) I am currently working on BagOfCuisGenerator.java with the analysis engine 
'AggregatePlaintextUMLSProcessor.xml', but that process is very slow at that 
step:

INFO UmlsDictionaryLookupAnnotator - process(JCas)

Does anyone know why is it so slow?

2) I also tried with 'AggregatePlaintextFastUMLSProcessor.xml' and it's 
actually pretty fast like his name suggests, but my interest is to be able to 
create my own HsqlDb-based dictionary like we can do with a Lucene index and 
integrate it in the process, is it possible with the fast version? Do you have 
any pointers that could allow me to do that?

Thank you very much for you time.

--
--
 Maïté Meseure Hugues

Reply via email to