I am getting some performance numbers that don't make sense based on my 
understanding of how Jena and SDB work. It was my understanding that nothing is 
cached in memory, that each API call will result in queries performed against 
the database. But I am getting very different results, suggesting that on the 
very first call, a significant amount of processing takes place to produce the 
results, that is then used for subsequent calls.

I’ll try to explain in detail what I have. There is an ICD9 ontology, it is 
strictly a hierarchy of 11874 classes, each class has a singleton, the 
hierarchy is around 5 levels deep. Since this is a read-only set of data, I 
created an OntModel with the appropriate level of reasoning, output this and 
then read it into another model. Below is a list of the number of nodes for 
that ICD9 model, in both forms, to give you an idea of how large it is.

mysql> select lex, count(*) from Nodes, Quads where g = hash group by lex;
+---------------------------------------------------------+----------+
| lex                                                     | count(*) |
+---------------------------------------------------------+----------+
| http://purl.bioontology.org/ontology/HOM_ICD9/          |    73420 |
| http://purl.bioontology.org/ontology/HOM_ICD9_inferred/ |   262024 |
| http://www.sas.com/hls/hoa/patient/                     |     1282 |
+---------------------------------------------------------+----------+

Though I am reading in the patient ontology, I am not yet using it, that will 
be my next test.

For the benchmark application, I read in a file that contains the following 
descriptions of diseases, defined via a set of ICD9 codes. I add the 
HOM_ICD9_inferred and patient ontologies using addSubModel().

@prefix :        <http://www.sas.com/hls/ ex/> .
@prefix HOM_ICD9: <http://purl.bioontology.org/ontology/HOM_ICD9/> .
@prefix xsd:     <http://www.w3.org/2001/XMLSchema#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl:     <http://www.w3.org/2002/07/owl#> .

:Disease rdf:type owl:Class ;
        rdfs:label "Disease"^^xsd:string .

:Colon_Cancer rdf:type owl:Class ;
        rdfs:subClassOf :Disease ;
        rdfs:label "Colon Cancer"^^xsd:string .

:Anal_Cancer rdf:type owl:Class ;
        rdfs:subClassOf :Disease ;
        rdfs:label "Anal Cancer"^^xsd:string .

:Lung_Cancer rdf:type owl:Class ;
        rdfs:subClassOf :Disease ;
        rdfs:label "Lung Cancer"^^xsd:string .

:Breast_Cancer rdf:type owl:Class ;
        rdfs:subClassOf :Disease ;
        rdfs:label "Breast Cancer"^^xsd:string .

:Prostate_Cancer rdf:type owl:Class ;
        rdfs:subClassOf :Disease ;
        rdfs:label "Prostate Cancer"^^xsd:string .

:Breast_Cancer a owl:Class ;
        owl:unionOf (
                HOM_ICD9:HOM_ICD_10558  # V10.3
                HOM_ICD9:HOM_ICD_1292   # 174
                HOM_ICD9:HOM_ICD_1297   # 174.4
                HOM_ICD9:HOM_ICD_1300   # 174.8
                HOM_ICD9:HOM_ICD_1294   # 174.1
                HOM_ICD9:HOM_ICD_1295   # 174.2
                HOM_ICD9:HOM_ICD_1299   # 174.6
                HOM_ICD9:HOM_ICD_1758   # 233
                HOM_ICD9:HOM_ICD_1296   # 174.3
                HOM_ICD9:HOM_ICD_1298   # 174.5
                HOM_ICD9:HOM_ICD_1304   # 175.9
                HOM_ICD9:HOM_ICD_1301   # 174.9
                HOM_ICD9:HOM_ICD_1302   # 175
        ) .

:Lung_Cancer a owl:Class ;
        owl:unionOf (
                HOM_ICD9:HOM_ICD_1231   # 162.9
                HOM_ICD9:HOM_ICD_1744   # 231.2
                HOM_ICD9:HOM_ICD_10551  # V10.11
                HOM_ICD9:HOM_ICD_1228   # 162.4
                HOM_ICD9:HOM_ICD_1229   # 162.5
                HOM_ICD9:HOM_ICD_1227   # 162.3
                HOM_ICD9:HOM_ICD_1230   # 162.8
                HOM_ICD9:HOM_ICD_1544   # 209.21
                HOM_ICD9:HOM_ICD_1226   # 162.2
        ) .

:Colon_Cancer a owl:Class ;
        owl:unionOf (
                HOM_ICD9:HOM_ICD_10546  # V10.05
                HOM_ICD9:HOM_ICD_1170   # 153.6
                HOM_ICD9:HOM_ICD_1166   # 153.2
                HOM_ICD9:HOM_ICD_1169   # 153.5
                HOM_ICD9:HOM_ICD_1163   # 153
                HOM_ICD9:HOM_ICD_1165   # 153.1
                HOM_ICD9:HOM_ICD_1734   # 230.3
                HOM_ICD9:HOM_ICD_1173   # 153.9
                HOM_ICD9:HOM_ICD_1168   # 153.4
                HOM_ICD9:HOM_ICD_1540   # 209.16
                HOM_ICD9:HOM_ICD_1539   # 209.15
                HOM_ICD9:HOM_ICD_1538   # 209.14
                HOM_ICD9:HOM_ICD_1537   # 209.13
                HOM_ICD9:HOM_ICD_1172   # 153.8
                HOM_ICD9:HOM_ICD_1536   # 209.12
                HOM_ICD9:HOM_ICD_1535   # 209.11
                HOM_ICD9:HOM_ICD_1171   # 153.7
                HOM_ICD9:HOM_ICD_1533   # 209.1
                HOM_ICD9:HOM_ICD_1167   # 153.3
                HOM_ICD9:HOM_ICD_1202   # 159
        ) .

:Prostate_Cancer a owl:Class ;
        owl:unionOf (
                HOM_ICD9:HOM_ICD_10566  # V10.46
                HOM_ICD9:HOM_ICD_1767   # 233.4
                HOM_ICD9:HOM_ICD_1343   # 185
        ) .

:Anal_Cancer a owl:Class ;
        owl:unionOf (
                HOM_ICD9:HOM_ICD_1179   # 154.8
                HOM_ICD9:HOM_ICD_10547  # V10.06
                HOM_ICD9:HOM_ICD_1178   # 154.3
                HOM_ICD9:HOM_ICD_8412   # 796.76
                HOM_ICD9:HOM_ICD_8410   # 796.74
                HOM_ICD9:HOM_ICD_8409   # 796.73
                HOM_ICD9:HOM_ICD_1177   # 154.2
                HOM_ICD9:HOM_ICD_8408   # 796.72
                HOM_ICD9:HOM_ICD_8407   # 796.71
                HOM_ICD9:HOM_ICD_8405   # 796.7
                HOM_ICD9:HOM_ICD_1737   # 230.6
                HOM_ICD9:HOM_ICD_1736   # 230.5
                HOM_ICD9:HOM_ICD_1174   # 154
                HOM_ICD9:HOM_ICD_1735   # 230.4
                HOM_ICD9:HOM_ICD_1541   # 209.17
) .

Below are the results from running my program on a laptop. Once I do this on 
our server, I expect to get better numbers, but the general pattern of time 
differences will likely be the same. As you can see, the very first access of 
the Prostate_Cancer class and its instances to 4081 seconds, for just 3 classes 
involved. Yet subsequent calls to the other disease classes took substantially 
less time. Can this all be attributed to the time it takes the database to 
cache things, or is something going on in Jena that explains this???

I’ll be running this on our Linux server and will also try TDB for the ICD9 
ontology to see if how much difference it makes. But it would be really nice to 
hear if there is a Jena or SDB specific explanation for these results.

Performance will be an important consideration for this application. I want to 
be sure I am doing things in a way that gets to best possible performance when 
using Jena.



Loading http://purl.bioontology.org/ontology/HOM_ICD9_inferred/ took 0.359 
seconds
Loading http://www.sas.com/hls/patient/ took 0.0 seconds
http://www.sas.com/hls/ex/Prostate_Cancer
3 instances found.
Access took 4080.932 seconds

http://www.sas.com/hls/ex/Breast_Cancer
28 instances found.
Access took 3.313 seconds

http://www.sas.com/hls/ex/Lung_Cancer
9 instances found.
Access took 1.062 seconds

http://www.sas.com/hls/ex/Anal_Cancer
17 instances found.
Access took 1.656 seconds

http://www.sas.com/hls/ex/Colon_Cancer
25 instances found.
Access took 2.641 seconds

http://www.sas.com/hls/ex/Disease
81 instances found.
Access took 4.891 seconds


David Jordan
Software Developer
SAS Institute Inc.
Health & Life Sciences, Research & Development
Bldg R ▪ Office 4467
600 Research Drive ▪ Cary, NC 27513
Tel: 919 531 1233 ▪ [email protected]<mailto:[email protected]>
www.sas.com<http://www.sas.com>
SAS® … THE POWER TO KNOW®



Reply via email to