Hi Pratik,

You have discovered one of the hidden features of the dictionary creator gui.  
And of course when a "feature" is the opposite of what you want it is called a 
"bug".  It is both depending upon your perspective.

With regard to the "SY" type for rxnorm, you can find a list of the rxnorm term 
types here:
https://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/RXNORM/stats.html
Even though the table just states "designated synonym" I have witnessed that 
the SY type actually consists largely of drug synonyms with dosage, form, 
route, etc.  

By excluding this term type and term texts determined to have dosage, form, 
route, etc. by an algorithm in the gui, we greatly reduce the size of the 
database.  The idea behind this is that medication/drug in the ctakes type 
system has assignable attributes for dosage, route, form, etc.  The unique 
identifier will apply to the drug type and additional information is stored 
elsewhere, outside the cui.  In addition to allowing a much smaller database, 
this allows simpler queries on ctakes-produced data:  1. Give me all [drug 
CUI].  2.  Filter by [dose property value]  
Many researchers are not interested in dose, they just want to know if the drug 
is taken.  They don't want to look for every single cui that indicates a unique 
combination of attributes.  Also, drugs can easily be filtered / sorted by 
attribute value ranges, no post-process matching of multiple cuis-to-attribute 
values is needed for simple display, etc. 
But one of the biggest advantages to this method is the smaller database 
footprint and faster lookup times.  Adding all attribute value permutations of 
a drug is explosive.

All that being said, I would like to enhance the gui to (among other things) 
allow the user to turn this on and off.  I do realize that some people might be 
particularly interested in drugs and have reason to require specific cuis to be 
present.  It is completely possible to do this, even easy wrt flipping a filter 
switch on and off.  However, the gui and 90% of my other ctakes contributions 
are done in my own spare time and I have to prioritize.  The gui currently fits 
the needs of 90% of users (afaik) so other items like a simple pipeline 
assembler gui, the piper files, a distributed run submitter gui, adding owl 
dictionaries, etc. are all higher priorities for me.  So, if you can volunteer 
a little time to improving the dictionary creator gui that would be fantastic!

I hope that this answers your questions,
Sean

P.S., you might know this already, but "Castor" is the actual drug source.  It 
comes from the "Castor" plant (ricinis communis) and it surprises some people 
because plain old ricin is poisonous.    - It is denatured during oil 
production.  http://www.library.illinois.edu/vex/toxic/castor/castor.htm
"Oil" is obviously the form, which does have a cui of its own to make output 
data searching easier.
"793 mg" is either the amount or the strength ... Because this can have a huge 
range of values it does not get a cui.


-----Original Message-----
From: pratik agarwal [mailto:pratikagarwal2...@gmail.com] 
Sent: Thursday, January 19, 2017 4:01 AM
To: dev@ctakes.apache.org
Subject: Re: Getting specific RXNORM and ICD codes instead of class codes

Hi Sean

I have 2 problems here that I need help with:

*Problem 1:*
I added a custom dictionary using the dictionary-gui module that you built.
I downloaded the complete rxnorm files from UMLS website:
RxNorm_full_prescribe_01032017.zip
<https://urldefense.proofpoint.com/v2/url?u=https-3A__download.nlm.nih.gov_rxnorm_RxNorm-5Ffull-5Fprescribe-5F01032017.zip&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=74hbPlDe4Uida0NOBNV35zlVVmcnpitU-auJJGMhVxo&s=FG_VqSRAc9jtYaQWpsal98cn8_eoZf8uLCofZ6ozsK8&e=
 >, You mentioned in the dictionary-gui thread that the module parses 
"MRCONSO.RRF" to create the dictionary in .script format. I found a file 
"RXNCONSO.RRF" in the zip directory I downloaded, and changed every mention of 
"RXNORM" to "RXNORM_16AA_160906F" and appended it to "MRCONSO.RRF"
thinking that the codes will get added to the dictionary, but when I passed a 
query to the database using .rc file created:


*SELECT * FROM RXNORM_16AA_160906F WHERE RXNORM_16AA_160906F = 997422; *

it returned 0 matches.

Note that 997422 was present in the modified "MRCONSO.RRF" as:
*997422|ENG||||||1358127|1358127|997422||RXNORM_16AA_160906F|SY|997422|Allegra
180 MG Oral Tablet||N|4096|*

This din't seem to work, so I opened "MrconsoParser.java" in the dictionary-gui 
module and I found a line:

*static private final String[] RXNORM_EXCLUSIONS = { "SY" };*

So I removed *"SY"* from the set and made it empty: *RXNORM_EXCLUSIONS = { }; 
*thinking maybe the lines having *SY *are being excluded, but even after this 
it was no use.

It would be great if you could help me understand what's going on here.


*Problem 2: *
When I query the database with the code: *309035,* it returns a match where 
*CUI = 975312*.
This is the code for *Castor Oil 793MG. *

Since it exists in the dictionary, when I pass *"Castor Oil 793 MG", *I should 
ideally get:
Castor Oil 793 MG --> codes:[309035]

Instead, I still get:
Castor --> codes:[2129]
Oil --> codes:[1021284]
793 MG --> codes:[]

Please help me understand how this is actually picking out the covered text, 
and if there's a way I can modify the pattern so that it gives me the required 
output.

Thanks and Best Regards
Pratik Agarwal

On Tue, Jan 3, 2017 at 7:42 PM, Finan, Sean < sean.fi...@childrens.harvard.edu> 
wrote:

> Hi Pratik,
>
> Because combinations of strength, route, form, etc. for medications 
> amount to an enormous number of unique terms, the possible 
> combinations are not included in the default ctakes dictionary.
>
> You can:
> 1.  Add a custom dictionary with your fully-defined terms of interest, 
> or 2.  Create a post-process module that maps [drug, strength, route, 
> form] to rxnorm codes based upon coded drug and following text or other codes.
> 3.  Use something like the drug-ner module to identify drug attributes 
> and use them instead of fully-specifying drug, strength, route, form codes.
>
> There may be other possibilities, or maybe somebody out there has 
> already done this and can be of further assistance.
>
> Sean
>
> -----Original Message-----
> From: pratik agarwal [mailto:pratikagarwal2...@gmail.com]
> Sent: Monday, December 26, 2016 4:57 AM
> To: dev@ctakes.apache.org
> Subject: Getting specific RXNORM and ICD codes instead of class codes
>
> Hi Everyone
>
> I am using cTAKES from the svn repository in IntelliJ IDEA. I wrote a 
> small script calling the *getFastPipeline()* function from the
> *ClinicalPipelineFactory* class.
>
> On passing the string:
> *"clonazePAM 0.5 mg oral tablet*
> *clopidogrel 75 mg oral tablet"*
>
> I'm getting the annotated RXNORM tokens as:
>
> *clonazePAM --> codes: [2598]*
> *0.5 mg --> codes: []*
> *0.5 --> codes: []*
> *oral tablet --> codes: [317541]*
>
>
> *clopidogrel  --> codes: [32968]*
> *75mg --> codes:[]*
>
> *oral tablet --->  codes: [317541]*
>
> where actually I need something like:
>
> *clonazePAM 0.5 mg oral tablet --> codes: [197527]* *clopidogrel 75 mg 
> oral tablet --> codes:[309362]*
>
> Can you please suggest if there's a way I can get these specific codes 
> instead of the class codes?
>
> Thanks and Best Regards.
> Pratik Agarwal
>

Reply via email to