Thank you Sean for your complete reply, it's helpful. On Tue, Mar 10, 2015 at 11:53 AM, Finan, Sean < [email protected]> wrote:
> Hi Maite, > > > Does anyone know why is it [UmlsDictionaryLookupAnnotator ]so slow? > The top 5 reasons (1-3 are 90% of the problem): > 1. The dictionary database is bloated with unwanted entries > 2. The dictionary database indexing is sub-optimal > 3. The second drug lookup with orangebook filtering takes extra time > 4. The matching algorithm does a little more work than is necessary > 5. There is some redundancy > > > my interest is to be able to create my own HsqlDb-based dictionary > If you want to build a database using a subset of UMLS, check out the > Dictionary Tool in Sandbox. It can build custom hsqldb dictionaries in > both the new (-fast) and old format using sources, tuis, filters, etc. that > you specify in plaintext parameter files. Several types of default setups > are already available. It is fully functional, but it has been a > work-in-progress during my off-hours, so functionality changes and > documentation is lacking, but there is a howto.txt in the > dictionarytool/doc/ directory. > > *NOTE: if your custom dictionaries are small (~1000 entries?) then it > would probably be easier to just throw them into a bar-separated value > (bsv) file. There are examples in the dictionary-fast-res example/bsv/ > directory. > > Sean > > -----Original Message----- > From: Maite Meseure Hugues [mailto:[email protected]] > Sent: Tuesday, March 10, 2015 12:35 PM > To: [email protected] > Subject: Questions about dictionary-lookup and dictionary-lookup-fast > > Hi everyone, > > 1) I am currently working on BagOfCuisGenerator.java with the analysis > engine 'AggregatePlaintextUMLSProcessor.xml', but that process is very slow > at that step: > > INFO UmlsDictionaryLookupAnnotator - process(JCas) > > Does anyone know why is it so slow? > > 2) I also tried with 'AggregatePlaintextFastUMLSProcessor.xml' and it's > actually pretty fast like his name suggests, but my interest is to be able > to create my own HsqlDb-based dictionary like we can do with a Lucene index > and integrate it in the process, is it possible with the fast version? Do > you have any pointers that could allow me to do that? > > Thank you very much for you time. > > -- > -- > Maïté Meseure Hugues > -- -- Maïté Meseure Hugues
