Hi, On Mon, 2011-09-19 at 15:04 +0000, David Jordan wrote: > I have switch over from SDB to TDB to see if I can get better performance. > In the following, Database is a class of mine that insulated the code from > knowing if it is SDB or TDB. > > I do the following, which combines 2 models I have stored in TDB and then > reads a third small model from a file that contains some classes I want to > “test”. I then have some code that times how long it takes to get a > particular class and list its instances. > > Model model1 = Database.getICD9inferredModel(); > Model model2 = Database.getPatientModel(); > OntModel omodel = > ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM_MICRO_RULE_INF, model1); > omodel.add(model2);
That is running a full rule reasoner over the TDB model. As I've mentioned before the rule inference engines store everything in memory so that doesn't give you any scaling over simply loading the file into memory and doing inference over that, it just goes very very slowly! > InputStream in = FileManager.get().open(fileName); > omodel.read(in, baseName, "TURTLE"); > > OntClass oclass = omodel.getOntClass(line); // access the class > > On the first call to getOntClass, I have been seeing a VERY long wait (around > an hour) before I get a response. > Then after that first call, subsequent calls are much faster. > But I started looking at the CPU utilization. After the call to getOntClass, > CPU utilization is very close to 0. > Is this to be expected? Seems plausible, the inference engines are in effect doing huge number of triple queries to TDB which will be spend most of its time waiting for the disk. If you really need to run live inference over the entire dataset then load it into a memory model first, then construct your inference model over that. > Is there any form of tracing/logging that can be turned on to determine what > (if anything) is happening? > > Is there something I am doing wrong in setting up my models? > For the ICD9 ontology I am using, I had read in the OWL data, created an > OntModel with it, wrote this OntModel data out. > Then I store the data from the OntModel into TDB, so it supposedly does not > have to do as much work at runtime. As Chris says, make sure you using writeAll not just plain write to store the OntModel. That aside, this doesn't necessarily save you much work because the rules are having to run anyway, they are just not discovering anything much new. In the absence of a highly scalable inference solution for Jena (something which can't be done without resourcing) then your two good options are: (1) Precompute all inferences, store those, then at runtime work with plain (no inference at all) models over that stored closure. (2) Load all the data into memory and run inference over that. Dave
