owlim-discussion  

Re: [Owlim-discussion] Evaluation of query time

Atanas Kiryakov
Fri, 27 Aug 2010 05:57:27 -0700

Hi

I am tempted to make few extra comments

1) What do you mean by "not yet built"? Are querying the repository while
statements are being inserted? If there is an ongoing process of statement
loading into repository this definitely will slow down all queries.

2) This is normal behavior. After the first several queries lots of data will be cached into memory which allows for faster access. When we assess the query performance we usually make several runs to warm-up the repository and then do the actual measuring. I consider this type of measurement closest to the real
life usage.

This would be a fair explanation for BigOWLIM.
In SwiftOWLIM the indices are kept in memory, so, query evaluation, on its own, should happen with the same speed, because there is no caching, performed by the engine. Possible explanations are: - faster fetching, based on cached literals and URIs - as far as I remember literals are not kept in memory, so, when those should be returned as a result, they need to be loaded from the storage - some caching phenomena could appear in such cases - CPU caching - it could be that some important pieces of data are cached in the L3 cache of the CPU, wich is way faster than the regular RAM

3) When using the SKOS vocabulary it is not uncommon to generate a huge amount
of broaderTransitive/narrowerTransitive relations which then make queries
harder. This is because the linear growth of the explicit broader relations generates quadratic number of broaderTransitive ones. You can check what is the case with you but my guess is that a big part of those 10M statements are
actually broaderTransitives (which might be a reason for query performance
degradation).

This depends on the average depth of the broader/narrower hierarchy - if the hierarchy is very deep, it can really increase the size of the repository. Still, most likely this is not the problem. Think of the extreme example where 1k concepts are interlinked with a chain of broaderTransitive ... This would lead to inference of 1M new statements, because of the quadratic dependency referred by Ivan. Still, 1M new statements should not be such a pain

Another possible explanation of the slowdown with the larger dataset is the fetching time. Fetching 26K results can take some time, some caches of the URI/Literal-to-ID dictionaty can get exhausted and a much slower retrieval from the disc could be required

What is most likely is that the incread of the dataset changes the complexity of the query evaluation. This could be for a wide range of non-magical reasons. Is the 10M dataset just a mater of linear scaling of the 4M one or it changes the "proportions" of the dataset? E.g. the new data are all from a specific sortor make some new patterns.

Note that SwiftOWLIM performs no query optimisations, so, the evaluation plan follows the order of the constraints from thr query. If the number of bindings of ?IdDescendant in the forst patterns increases several times, when the query is evaluated against the larger dataset, than the entire query evaluation can get slower

Cheers
Naso

----------------------------------------------------------
Atanas Kiryakov
Executive Director of Ontotext AD, http://www.ontotext.com
Sirma Group, http://www.sirma.bg
Phone: (+359 2) 974 61 44; Fax: 975 3226
---------------------------------------------------------- There is no mental process that can change the laws of nature or erase facts.
The function of consciousness is not to create reality, but to apprehend it.
"Existence is Identity, Consciousness is Identification."
Ayn Rand



Cheers,
Ivan

On Friday 27 August 2010 11:54:49 Buddy Rich wrote:
Hi, I am assessing the performance of SwiftOWLIM and I have three
 questions:

1) It seems to me that the query time when the repository is not yet
 "built" is slower than when the repository is ready. Is it an accident?

2) It seems to me that if I repeat the same kind of query for very similar nodes during the same execution, the performances improve a lot (e.g. from 400 ms to 100 ms). In other words, How should I carry out the assessment?
One query at a time or I can execute more queries at the same time?

3) I have a query like this:

prefix skos:  <http://www.w3.org/2004/02/skos/core#>

SELECT Distinct ?IdRelDoc
WHERE
{
    ?IdDescendant skos:broaderTransitive <http://www.ex.it/concept#207>.
    ?idDescendant skos:related ?IdRelConcept.
    ?IdRelDoc skos:subject ?IdRelConcept.
}

Is it plausible that from an ontology with ~4M statements to one with ~10M, the performances increase from 234 ms (for 10k results) to 70k ms (for 26k
 results)?

Thank you in advance!




_______________________________________________
OWLIM-discussion mailing list
OWLIM-discussion@ontotext.com
http://ontotext.com/mailman/listinfo/owlim-discussion

_______________________________________________
OWLIM-discussion mailing list
OWLIM-discussion@ontotext.com
http://ontotext.com/mailman/listinfo/owlim-discussion

_______________________________________________
OWLIM-discussion mailing list
OWLIM-discussion@ontotext.com
http://ontotext.com/mailman/listinfo/owlim-discussion