That does, thank you, as always. One other question: Your docs say it should take around an hour and a half at 8g of ram for the umls... my times are turning out significantly lower (3-5 minutes)... the *.gz output seems to be on an order of magnitude with the included compressed concept graphs and queries seem to run OK, but it makes me a little nervous that it is processing it that fast. Should I be worried?
Thanks, JG On Thu, Oct 16, 2014 at 6:29 AM, vijay garla <[email protected]> wrote: > I don't know what the difference between PAR/CHD (parent/child) and RB/RN > (broader/narrower) is supposed to be. some umls source vocabularies use > PAR/CHD only/predominantly (e.g. SNOMED-CT), others use RB/RN (e.g. > RXNORM). You can use and experiment with whatever relationships you want > (I think there might be part of/contains relationships too). > > the concept graph is a directed acyclic graph, and the query should return > parent-child edges (or maybe the other way around, not sure). If your > query uses e.g. rel in ('PAR', 'CHD'), you will return edges going both > directions. This shouldn't cause any problems, as we discard edges that > induce cycles, but it will create a bunch of overhead for no gain. > > If you look at other concept graph configs, e.g. > > https://svn.apache.org/repos/asf/ctakes/trunk/ctakes-ytex-res/src/main/resources/org/apache/ctakes/ytex/conceptGraph/sct-rxnorm.template.xml > , > you will see that we use both PAR & RB relationships. > > HTH, > > VJ > > > > > > On Thu, Oct 16, 2014 at 2:58 AM, John Green <[email protected]> > wrote: > > > Hope this finds everyone well. > > > > It is not immediately clear to me why > > > > select distinct cui1, cui2 > > from umls.MRREL > > where sab in ('SNOMEDCT') > > and rel in ('PAR') > > order by cui1, cui2 > > > > would only be selecting the relationship (REL) of PAR. Im not sure the > > selection criteria. This is honestly probably directed mostly at Vijay, > but > > anyone else with experience in this domain would be a welcome voice. In > the > > paper on YTEX, for instance, PAR and RB are chosen for UMLS. Why? Does > this > > have to do with the "flattening" or "orphaning" that UMLS does to the > > vocabularies it includes? Why not PAR, RB, and RN? Why not more? Was > this a > > computational (speed/memory) consideration, or a functional one that my > > lack of familiarity to the domain is keeping me from seeing. > > > > Im posting this fairly specific question to the Dev because it directly > > relates to building YTEX concept graphs, which is a functionality of our > > distro here. > > > > Best! > > JG > > >
