owlim-discussion  

Re: [Owlim-discussion] Evaluation of query time

Damyan Ognyanoff
Wed, 01 Sep 2010 06:14:42 -0700

Hi,

few more notes about your initial post - the observed difference in the processing speed between the first run and any subsequent ones could be also explained by the lack of inference performed after the initial run. On any subsequent one, the repository is just initialized by its file image and when your data is processed again only the parsing takes time and since no new triples appear, no inference is performed and the whole run takes less time to complete (including the query evaluations). This assumes that you are using the GettingStatred application for all your assessments, right?

About the quadratic nature of skos:broaderTransitive - first, it is an owl:TransitiveProperty, it is also owl:inverseOf skos:narrowerTransitive and finally, it is rdfs:subPropertyOf of skos:semanticRelation. Having that in mind, if you have a chain of 1000 nodes related with skos:broaderTransitive - e.g. 1000 statements like:
n1 skos:broaderTransitive n2 .
n2 skos:broaderTransitive n3 .
...
n999 skos:broaderTransitive n1000 .

- due to its transitivity you will end up with n*n/2 additional statements relating n1 to n3, n4 ..n1000, n2 to n4, n5, ...n1000 and n998 to n1000 - due to its inverseOf with skos:narrowerTransitive you will end up with the same amount of skos:narrowerTransitive statements relating the same nodes but in opposite direction - and due to the subPropertyOf you will have another n*n statements with skos:semanitincRelation (skos:narrowerTransitive is also a subPropertyOf skos:semanticRelation) - one for each broader or narrower relation you have. Hope that this remark sheds some light on how it could affect your query evaluation time when the data grows up - although it depends on the kind of data with which you are extending the initial dataset. Also check that you do not introduce any/too much/too long/ cycles with any of the transitive relations. Each such cycle will give you another quadratic expansion over the affected portion of your dataset.

Another bottleneck is the use of 'distinct' in your queries - the implementation caches all the unique results and filters out the duplicates - it takes some time to index the results and also some memory to keep that hash index during the query evaluation. Although on its own it shouldn't affect that drastically the query evaluation time - maybe you are reaching some memory limit where the GC start taking much more resources ...

HTH,
Damyan Ognyanov
Ontotext AD

----- Original Message ----- From: "Buddy Rich" <a_little_b...@yahoo.com>
To: "Atanas Kiryakov" <n...@sirma.bg>; <owlim-discussion@ontotext.com>
Sent: Saturday, August 28, 2010 9:47 PM
Subject: Re: [Owlim-discussion] Evaluation of query time


Thank you very much, Ivan and Atanas!
Just for clarity:
- The 10M dataset does not change the "proportions" of the dataset.
-When I say that the repository is not yet built, I mean that the "repositories" directory is not yet created (e.g. it is the first execution of "GettingStarted" for the ontology).

Now, if it is possible, I would like to show three observations/doubts:

1)In the "owlim.ttl" file I choose these options:
        owlim:ruleset "owl-max" ;

owlim:partialRDFS  "true" ;

        owlim:noPersist "true" ;

        owlim:entity-index-size "200000" ;

        owlim:jobsize "200" ;

Maybe, should I change these options, expecially index and job size (or the performances should be the same)?

2)Could you suggest me any references about the "quadratic dependency" with the broaderTransitive? Just for curiosity!

3)I repeated the same kind of query (something like the one I showed you in my first post) both with the concept URI and the prefLabel.

Something like this:

[query1]
prefix skos:  <http://www.w3.org/2004/02/skos/core#>
SELECT DISTINCT ?IdDoc
WHERE {
   ?IdDoc skos:subject <http://www.ex.com/concept#103>.
}

and

[query2]
prefix skos:  <http://www.w3.org/2004/02/skos/core#>
SELECT DISTINCT ?IdDoc
WHERE {
   ?IdCon skos:prefLabel "NameConc"@en.
   ?IdDoc skos:subject ?IdCon.
}

I thought the query with the URI was faster than the one with the prefLabel, but it seems to me that the performances are identical (sometimes the ones with the prefLabel are faster, also for more difficult queries). Is there any rational explanation?

Thank you for patience!

--- Ven 27/8/10, Atanas Kiryakov <n...@sirma.bg> ha scritto:

Da: Atanas Kiryakov <n...@sirma.bg>
Oggetto: Re: [Owlim-discussion] Evaluation of query time
A: "Ivan Peikov" <ivan.pei...@ontotext.com>, owlim-discussion@ontotext.com
Data: Venerdì 27 agosto 2010, 12:56
Hi

I am tempted to make few extra comments

> 1) What do you mean by "not yet built"? Are querying
the repository while
> statements are being inserted? If there is an ongoing
process of statement
> loading into repository this definitely will slow down
all queries.
>
> 2) This is normal behavior. After the first several
queries lots of data will
> be cached into memory which allows for faster access.
When we assess the query
> performance we usually make several runs to warm-up
the repository and then do
> the actual measuring. I consider this type of
measurement closest to the real
> life usage.

This would be a fair explanation for BigOWLIM.
In SwiftOWLIM the indices are kept in memory, so, query
evaluation, on its own, should happen with the same speed,
because there is no caching, performed by the engine.
Possible explanations are:
- faster fetching, based on cached literals and URIs - as
far as I remember literals are not kept in memory, so, when
those should be returned as a result, they need to be loaded
from the storage - some caching phenomena could appear in
such cases
- CPU caching - it could be that some important pieces of
data are cached in the L3 cache of the CPU, wich is way
faster than the regular RAM

> 3) When using the SKOS vocabulary it is not uncommon
to generate a huge amount
> of broaderTransitive/narrowerTransitive relations
which then make queries
> harder. This is because the linear growth of the
explicit broader relations
> generates quadratic number of broaderTransitive ones.
You can check what is
> the case with you but my guess is that a big part of
those 10M statements are
> actually broaderTransitives (which might be a reason
for query performance
> degradation).

This depends on the average depth of the broader/narrower
hierarchy - if the hierarchy is very deep, it can really
increase the size of the repository. Still, most likely this
is not the problem. Think of the extreme example where 1k
concepts are interlinked with a chain of broaderTransitive
... This would lead to inference of 1M new statements,
because of the quadratic dependency referred by Ivan. Still,
1M new statements should not be such a pain

Another possible explanation of the slowdown with the
larger dataset is the fetching time. Fetching 26K results
can take some time, some caches of the URI/Literal-to-ID
dictionaty can get exhausted and a much slower retrieval
from the disc could be required

What is most likely is that the incread of the dataset
changes the complexity of the query evaluation. This could
be for a wide range of non-magical reasons. Is the 10M
dataset just a mater of linear scaling of the 4M one or it
changes the "proportions" of the dataset? E.g. the new data
are all from a specific sortor make some new patterns.

Note that SwiftOWLIM performs no query optimisations, so,
the evaluation plan follows the order of the constraints
from thr query. If the number of bindings of ?IdDescendant
in the forst patterns increases several times, when the
query is evaluated against the larger dataset, than the
entire query evaluation can get slower

Cheers
Naso

----------------------------------------------------------
Atanas Kiryakov
Executive Director of Ontotext AD, http://www.ontotext.com
Sirma Group, http://www.sirma.bg
Phone: (+359 2) 974 61 44; Fax: 975 3226
----------------------------------------------------------
There is no mental process that can change the laws of
nature or erase facts.
The function of consciousness is not to create reality, but
to apprehend it.
"Existence is Identity, Consciousness is Identification."
Ayn Rand

>
>
> Cheers,
> Ivan
>
> On Friday 27 August 2010 11:54:49 Buddy Rich wrote:
>> Hi, I am assessing the performance of SwiftOWLIM
and I have three
>> questions:
>>
>> 1) It seems to me that the query time when the
repository is not yet
>> "built" is slower than when the repository
is ready. Is it an accident?
>>
>> 2) It seems to me that if I repeat the same kind
of query for very similar
>> nodes during the same execution, the
performances improve a lot (e.g. from
>> 400 ms to 100 ms). In other words, How
should I carry out the assessment?
>> One query at a time or I can execute more queries
at the same time?
>>
>> 3) I have a query like this:
>>
>> prefix skos: <http://www.w3.org/2004/02/skos/core#>
>>
>> SELECT Distinct ?IdRelDoc
>> WHERE
>> {
>> ?IdDescendant
skos:broaderTransitive <http://www.ex.it/concept#207>.
>> ?idDescendant skos:related
?IdRelConcept.
>> ?IdRelDoc skos:subject
?IdRelConcept.
>> }
>>
>> Is it plausible that from an ontology with ~4M
statements to one with ~10M,
>> the performances increase from 234 ms (for
10k results) to 70k ms (for 26k
>> results)?
>>
>> Thank you in advance!
>>
>>
>>
>>
>> _______________________________________________
>> OWLIM-discussion mailing list
>> OWLIM-discussion@ontotext.com
>> http://ontotext.com/mailman/listinfo/owlim-discussion
>>
> _______________________________________________
> OWLIM-discussion mailing list
> OWLIM-discussion@ontotext.com
> http://ontotext.com/mailman/listinfo/owlim-discussion

_______________________________________________
OWLIM-discussion mailing list
OWLIM-discussion@ontotext.com
http://ontotext.com/mailman/listinfo/owlim-discussion





_______________________________________________
OWLIM-discussion mailing list
OWLIM-discussion@ontotext.com
http://ontotext.com/mailman/listinfo/owlim-discussion
_______________________________________________
OWLIM-discussion mailing list
OWLIM-discussion@ontotext.com
http://ontotext.com/mailman/listinfo/owlim-discussion