Atanas Kiryakov
Fri, 27 Aug 2010 05:57:27 -0700
Hi I am tempted to make few extra comments
1) What do you mean by "not yet built"? Are querying the repository while statements are being inserted? If there is an ongoing process of statement loading into repository this definitely will slow down all queries.2) This is normal behavior. After the first several queries lots of data will be cached into memory which allows for faster access. When we assess the query performance we usually make several runs to warm-up the repository and then do the actual measuring. I consider this type of measurement closest to the reallife usage.
This would be a fair explanation for BigOWLIM.In SwiftOWLIM the indices are kept in memory, so, query evaluation, on its own, should happen with the same speed, because there is no caching, performed by the engine. Possible explanations are: - faster fetching, based on cached literals and URIs - as far as I remember literals are not kept in memory, so, when those should be returned as a result, they need to be loaded from the storage - some caching phenomena could appear in such cases - CPU caching - it could be that some important pieces of data are cached in the L3 cache of the CPU, wich is way faster than the regular RAM
3) When using the SKOS vocabulary it is not uncommon to generate a huge amountof broaderTransitive/narrowerTransitive relations which then make queriesharder. This is because the linear growth of the explicit broader relations generates quadratic number of broaderTransitive ones. You can check what is the case with you but my guess is that a big part of those 10M statements areactually broaderTransitives (which might be a reason for query performance degradation).
This depends on the average depth of the broader/narrower hierarchy - if the hierarchy is very deep, it can really increase the size of the repository. Still, most likely this is not the problem. Think of the extreme example where 1k concepts are interlinked with a chain of broaderTransitive ... This would lead to inference of 1M new statements, because of the quadratic dependency referred by Ivan. Still, 1M new statements should not be such a pain
Another possible explanation of the slowdown with the larger dataset is the fetching time. Fetching 26K results can take some time, some caches of the URI/Literal-to-ID dictionaty can get exhausted and a much slower retrieval from the disc could be required
What is most likely is that the incread of the dataset changes the complexity of the query evaluation. This could be for a wide range of non-magical reasons. Is the 10M dataset just a mater of linear scaling of the 4M one or it changes the "proportions" of the dataset? E.g. the new data are all from a specific sortor make some new patterns.
Note that SwiftOWLIM performs no query optimisations, so, the evaluation plan follows the order of the constraints from thr query. If the number of bindings of ?IdDescendant in the forst patterns increases several times, when the query is evaluated against the larger dataset, than the entire query evaluation can get slower
Cheers Naso ---------------------------------------------------------- Atanas Kiryakov Executive Director of Ontotext AD, http://www.ontotext.com Sirma Group, http://www.sirma.bg Phone: (+359 2) 974 61 44; Fax: 975 3226---------------------------------------------------------- There is no mental process that can change the laws of nature or erase facts.
The function of consciousness is not to create reality, but to apprehend it. "Existence is Identity, Consciousness is Identification." Ayn Rand
Cheers, Ivan On Friday 27 August 2010 11:54:49 Buddy Rich wrote:Hi, I am assessing the performance of SwiftOWLIM and I have three questions: 1) It seems to me that the query time when the repository is not yet "built" is slower than when the repository is ready. Is it an accident?2) It seems to me that if I repeat the same kind of query for very similar nodes during the same execution, the performances improve a lot (e.g. from 400 ms to 100 ms). In other words, How should I carry out the assessment?One query at a time or I can execute more queries at the same time? 3) I have a query like this: prefix skos: <http://www.w3.org/2004/02/skos/core#> SELECT Distinct ?IdRelDoc WHERE { ?IdDescendant skos:broaderTransitive <http://www.ex.it/concept#207>. ?idDescendant skos:related ?IdRelConcept. ?IdRelDoc skos:subject ?IdRelConcept. }Is it plausible that from an ontology with ~4M statements to one with ~10M, the performances increase from 234 ms (for 10k results) to 70k ms (for 26kresults)? Thank you in advance! _______________________________________________ OWLIM-discussion mailing list OWLIM-discussion@ontotext.com http://ontotext.com/mailman/listinfo/owlim-discussion_______________________________________________ OWLIM-discussion mailing list OWLIM-discussion@ontotext.comhttp://ontotext.com/mailman/listinfo/owlim-discussion
_______________________________________________ OWLIM-discussion mailing list OWLIM-discussion@ontotext.com http://ontotext.com/mailman/listinfo/owlim-discussion