I am getting some performance numbers that don't make sense based on my understanding of how Jena and SDB work. It was my understanding that nothing is cached in memory, that each API call will result in queries performed against the database. But I am getting very different results, suggesting that on the very first call, a significant amount of processing takes place to produce the results, that is then used for subsequent calls.
I’ll try to explain in detail what I have. There is an ICD9 ontology, it is strictly a hierarchy of 11874 classes, each class has a singleton, the hierarchy is around 5 levels deep. Since this is a read-only set of data, I created an OntModel with the appropriate level of reasoning, output this and then read it into another model. Below is a list of the number of nodes for that ICD9 model, in both forms, to give you an idea of how large it is. mysql> select lex, count(*) from Nodes, Quads where g = hash group by lex; +---------------------------------------------------------+----------+ | lex | count(*) | +---------------------------------------------------------+----------+ | http://purl.bioontology.org/ontology/HOM_ICD9/ | 73420 | | http://purl.bioontology.org/ontology/HOM_ICD9_inferred/ | 262024 | | http://www.sas.com/hls/hoa/patient/ | 1282 | +---------------------------------------------------------+----------+ Though I am reading in the patient ontology, I am not yet using it, that will be my next test. For the benchmark application, I read in a file that contains the following descriptions of diseases, defined via a set of ICD9 codes. I add the HOM_ICD9_inferred and patient ontologies using addSubModel(). @prefix : <http://www.sas.com/hls/ ex/> . @prefix HOM_ICD9: <http://purl.bioontology.org/ontology/HOM_ICD9/> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . :Disease rdf:type owl:Class ; rdfs:label "Disease"^^xsd:string . :Colon_Cancer rdf:type owl:Class ; rdfs:subClassOf :Disease ; rdfs:label "Colon Cancer"^^xsd:string . :Anal_Cancer rdf:type owl:Class ; rdfs:subClassOf :Disease ; rdfs:label "Anal Cancer"^^xsd:string . :Lung_Cancer rdf:type owl:Class ; rdfs:subClassOf :Disease ; rdfs:label "Lung Cancer"^^xsd:string . :Breast_Cancer rdf:type owl:Class ; rdfs:subClassOf :Disease ; rdfs:label "Breast Cancer"^^xsd:string . :Prostate_Cancer rdf:type owl:Class ; rdfs:subClassOf :Disease ; rdfs:label "Prostate Cancer"^^xsd:string . :Breast_Cancer a owl:Class ; owl:unionOf ( HOM_ICD9:HOM_ICD_10558 # V10.3 HOM_ICD9:HOM_ICD_1292 # 174 HOM_ICD9:HOM_ICD_1297 # 174.4 HOM_ICD9:HOM_ICD_1300 # 174.8 HOM_ICD9:HOM_ICD_1294 # 174.1 HOM_ICD9:HOM_ICD_1295 # 174.2 HOM_ICD9:HOM_ICD_1299 # 174.6 HOM_ICD9:HOM_ICD_1758 # 233 HOM_ICD9:HOM_ICD_1296 # 174.3 HOM_ICD9:HOM_ICD_1298 # 174.5 HOM_ICD9:HOM_ICD_1304 # 175.9 HOM_ICD9:HOM_ICD_1301 # 174.9 HOM_ICD9:HOM_ICD_1302 # 175 ) . :Lung_Cancer a owl:Class ; owl:unionOf ( HOM_ICD9:HOM_ICD_1231 # 162.9 HOM_ICD9:HOM_ICD_1744 # 231.2 HOM_ICD9:HOM_ICD_10551 # V10.11 HOM_ICD9:HOM_ICD_1228 # 162.4 HOM_ICD9:HOM_ICD_1229 # 162.5 HOM_ICD9:HOM_ICD_1227 # 162.3 HOM_ICD9:HOM_ICD_1230 # 162.8 HOM_ICD9:HOM_ICD_1544 # 209.21 HOM_ICD9:HOM_ICD_1226 # 162.2 ) . :Colon_Cancer a owl:Class ; owl:unionOf ( HOM_ICD9:HOM_ICD_10546 # V10.05 HOM_ICD9:HOM_ICD_1170 # 153.6 HOM_ICD9:HOM_ICD_1166 # 153.2 HOM_ICD9:HOM_ICD_1169 # 153.5 HOM_ICD9:HOM_ICD_1163 # 153 HOM_ICD9:HOM_ICD_1165 # 153.1 HOM_ICD9:HOM_ICD_1734 # 230.3 HOM_ICD9:HOM_ICD_1173 # 153.9 HOM_ICD9:HOM_ICD_1168 # 153.4 HOM_ICD9:HOM_ICD_1540 # 209.16 HOM_ICD9:HOM_ICD_1539 # 209.15 HOM_ICD9:HOM_ICD_1538 # 209.14 HOM_ICD9:HOM_ICD_1537 # 209.13 HOM_ICD9:HOM_ICD_1172 # 153.8 HOM_ICD9:HOM_ICD_1536 # 209.12 HOM_ICD9:HOM_ICD_1535 # 209.11 HOM_ICD9:HOM_ICD_1171 # 153.7 HOM_ICD9:HOM_ICD_1533 # 209.1 HOM_ICD9:HOM_ICD_1167 # 153.3 HOM_ICD9:HOM_ICD_1202 # 159 ) . :Prostate_Cancer a owl:Class ; owl:unionOf ( HOM_ICD9:HOM_ICD_10566 # V10.46 HOM_ICD9:HOM_ICD_1767 # 233.4 HOM_ICD9:HOM_ICD_1343 # 185 ) . :Anal_Cancer a owl:Class ; owl:unionOf ( HOM_ICD9:HOM_ICD_1179 # 154.8 HOM_ICD9:HOM_ICD_10547 # V10.06 HOM_ICD9:HOM_ICD_1178 # 154.3 HOM_ICD9:HOM_ICD_8412 # 796.76 HOM_ICD9:HOM_ICD_8410 # 796.74 HOM_ICD9:HOM_ICD_8409 # 796.73 HOM_ICD9:HOM_ICD_1177 # 154.2 HOM_ICD9:HOM_ICD_8408 # 796.72 HOM_ICD9:HOM_ICD_8407 # 796.71 HOM_ICD9:HOM_ICD_8405 # 796.7 HOM_ICD9:HOM_ICD_1737 # 230.6 HOM_ICD9:HOM_ICD_1736 # 230.5 HOM_ICD9:HOM_ICD_1174 # 154 HOM_ICD9:HOM_ICD_1735 # 230.4 HOM_ICD9:HOM_ICD_1541 # 209.17 ) . Below are the results from running my program on a laptop. Once I do this on our server, I expect to get better numbers, but the general pattern of time differences will likely be the same. As you can see, the very first access of the Prostate_Cancer class and its instances to 4081 seconds, for just 3 classes involved. Yet subsequent calls to the other disease classes took substantially less time. Can this all be attributed to the time it takes the database to cache things, or is something going on in Jena that explains this??? I’ll be running this on our Linux server and will also try TDB for the ICD9 ontology to see if how much difference it makes. But it would be really nice to hear if there is a Jena or SDB specific explanation for these results. Performance will be an important consideration for this application. I want to be sure I am doing things in a way that gets to best possible performance when using Jena. Loading http://purl.bioontology.org/ontology/HOM_ICD9_inferred/ took 0.359 seconds Loading http://www.sas.com/hls/patient/ took 0.0 seconds http://www.sas.com/hls/ex/Prostate_Cancer 3 instances found. Access took 4080.932 seconds http://www.sas.com/hls/ex/Breast_Cancer 28 instances found. Access took 3.313 seconds http://www.sas.com/hls/ex/Lung_Cancer 9 instances found. Access took 1.062 seconds http://www.sas.com/hls/ex/Anal_Cancer 17 instances found. Access took 1.656 seconds http://www.sas.com/hls/ex/Colon_Cancer 25 instances found. Access took 2.641 seconds http://www.sas.com/hls/ex/Disease 81 instances found. Access took 4.891 seconds David Jordan Software Developer SAS Institute Inc. Health & Life Sciences, Research & Development Bldg R ▪ Office 4467 600 Research Drive ▪ Cary, NC 27513 Tel: 919 531 1233 ▪ [email protected]<mailto:[email protected]> www.sas.com<http://www.sas.com> SAS® … THE POWER TO KNOW®
