Re: benchmarking

Dave Reynolds Mon, 19 Sep 2011 08:39:14 -0700

Hi,

On Mon, 2011-09-19 at 15:04 +0000, David Jordan wrote: 
> I have switch over from SDB to TDB to see if I can get better performance.
> In the following, Database is a class of mine that insulated the code from 
> knowing if it is SDB or TDB.
> 
> I do the following, which combines 2 models I have stored in TDB and then 
> reads a third small model from a file that contains some classes I want to 
> “test”. I then have some code that times how long it takes to get a 
> particular class and list its instances.
> 
> Model model1 = Database.getICD9inferredModel();
> Model model2 = Database.getPatientModel();
> OntModel omodel = 
> ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM_MICRO_RULE_INF, model1);
> omodel.add(model2);


That is running a full rule reasoner over the TDB model. As I've
mentioned before the rule inference engines store everything in memory
so that doesn't give you any scaling over simply loading the file into
memory and doing inference over that, it just goes very very slowly!

> InputStream in = FileManager.get().open(fileName);
> omodel.read(in, baseName, "TURTLE");
> 
> OntClass oclass = omodel.getOntClass(line);   // access the class
> 
> On the first call to getOntClass, I have been seeing a VERY long wait (around 
> an hour) before I get a response.
> Then after that first call, subsequent calls are much faster.
> But I started looking at the CPU utilization. After the call to getOntClass, 
> CPU utilization is very close to 0.
> Is this to be expected?

Seems plausible, the inference engines are in effect doing huge number
of triple queries to TDB which will be spend most of its time waiting
for the disk.

If you really need to run live inference over the entire dataset then
load it into a memory model first, then construct your inference model
over that.

> Is there any form of tracing/logging that can be turned on to determine what 
> (if anything) is happening?
> 
> Is there something I am doing wrong in setting up my models?
> For the ICD9 ontology I am using, I had read in the OWL data, created an 
> OntModel with it, wrote this OntModel data out.
> Then I store the data from the OntModel into TDB, so it supposedly does not 
> have to do as much work at runtime.

As Chris says, make sure you using writeAll not just plain write to
store the OntModel.

That aside, this doesn't necessarily save you much work because the
rules are having to run anyway, they are just not discovering anything
much new.

In the absence of a highly scalable inference solution for Jena
(something which can't be done without resourcing) then your two good
options are:

(1) Precompute all inferences, store those, then at runtime work with
plain (no inference at all) models over that stored closure.

(2) Load all the data into memory and run inference over that.

Dave

Re: benchmarking

Reply via email to