RE: dictionary-look-fast fails to handle alternative CUIs

Finan, Sean Wed, 08 Jul 2015 11:56:49 -0700

By the way, in case you are wondering why it does this … the umls database that 
we use has roughly half a million cuis.  Storing cuis in the various tables as 
longs takes up a lot less space than storing them as 8 character strings.

From: britt fitch [mailto:[email protected]]
Sent: Wednesday, July 08, 2015 2:23 PM
To: [email protected]
Subject: dictionary-look-fast fails to handle alternative CUIs

This is largely directed to Sean but open to other feedback as well.

The current fast lookup using a BSV parses the first field as “C” and up to 7 
numerals, padding with “0" as needed to reach that length when applicable [see 
CuiCodeUtil.getCuiCode(String)]

The CUI string is then substring’d from 1 to len and parsed as a Long.

This is producing issues with other related, but separate, ontologies (MedGen) 
where the bulk of concepts use UMLS CUIs but some additional concepts were 
created by the NCBI where no CUI previously existed.
These MedGen-specific concepts are created with a prefix “CN” + 6 numerals, 
resulting in “N123456” failing to produce a Long.

I wanted Sean’s thoughts on this and to get some feedback on if others are 
running into this issue and if the community wants a solution to providing a 
CUI format beyond the standard C + 7 numerals.

I’m happy to make these edits and check them in whether that means updating the 
CuiCodeUtil class or creating an entirely new BSVConceptFactory if thats what 
makes the most sense.

Thoughts?

Britt Fitch
Wired Informatics
265 Franklin St Ste 1702
Boston, MA 02110
http://wiredinformatics.com
[email protected]

RE: dictionary-look-fast fails to handle alternative CUIs

Reply via email to