Damyan Ognyanoff
Wed, 01 Sep 2010 06:14:42 -0700
Hi,few more notes about your initial post - the observed difference in the processing speed between the first run and any subsequent ones could be also explained by the lack of inference performed after the initial run. On any subsequent one, the repository is just initialized by its file image and when your data is processed again only the parsing takes time and since no new triples appear, no inference is performed and the whole run takes less time to complete (including the query evaluations). This assumes that you are using the GettingStatred application for all your assessments, right?
About the quadratic nature of skos:broaderTransitive - first, it is an owl:TransitiveProperty, it is also owl:inverseOf skos:narrowerTransitive and finally, it is rdfs:subPropertyOf of skos:semanticRelation. Having that in mind, if you have a chain of 1000 nodes related with skos:broaderTransitive - e.g. 1000 statements like:
n1 skos:broaderTransitive n2 . n2 skos:broaderTransitive n3 . ... n999 skos:broaderTransitive n1000 .- due to its transitivity you will end up with n*n/2 additional statements relating n1 to n3, n4 ..n1000, n2 to n4, n5, ...n1000 and n998 to n1000 - due to its inverseOf with skos:narrowerTransitive you will end up with the same amount of skos:narrowerTransitive statements relating the same nodes but in opposite direction - and due to the subPropertyOf you will have another n*n statements with skos:semanitincRelation (skos:narrowerTransitive is also a subPropertyOf skos:semanticRelation) - one for each broader or narrower relation you have. Hope that this remark sheds some light on how it could affect your query evaluation time when the data grows up - although it depends on the kind of data with which you are extending the initial dataset. Also check that you do not introduce any/too much/too long/ cycles with any of the transitive relations. Each such cycle will give you another quadratic expansion over the affected portion of your dataset.
Another bottleneck is the use of 'distinct' in your queries - the implementation caches all the unique results and filters out the duplicates - it takes some time to index the results and also some memory to keep that hash index during the query evaluation. Although on its own it shouldn't affect that drastically the query evaluation time - maybe you are reaching some memory limit where the GC start taking much more resources ...
HTH, Damyan Ognyanov Ontotext AD----- Original Message ----- From: "Buddy Rich" <a_little_b...@yahoo.com>
To: "Atanas Kiryakov" <n...@sirma.bg>; <owlim-discussion@ontotext.com> Sent: Saturday, August 28, 2010 9:47 PM Subject: Re: [Owlim-discussion] Evaluation of query time Thank you very much, Ivan and Atanas! Just for clarity: - The 10M dataset does not change the "proportions" of the dataset.-When I say that the repository is not yet built, I mean that the "repositories" directory is not yet created (e.g. it is the first execution of "GettingStarted" for the ontology).
Now, if it is possible, I would like to show three observations/doubts:
1)In the "owlim.ttl" file I choose these options:
owlim:ruleset "owl-max" ;
owlim:partialRDFS "true" ;
owlim:noPersist "true" ;
owlim:entity-index-size "200000" ;
owlim:jobsize "200" ;
Maybe, should I change these options, expecially index and job size (or the
performances should be the same)?
2)Could you suggest me any references about the "quadratic dependency" with the broaderTransitive? Just for curiosity!
3)I repeated the same kind of query (something like the one I showed you in my first post) both with the concept URI and the prefLabel.
Something like this: [query1] prefix skos: <http://www.w3.org/2004/02/skos/core#> SELECT DISTINCT ?IdDoc WHERE { ?IdDoc skos:subject <http://www.ex.com/concept#103>. } and [query2] prefix skos: <http://www.w3.org/2004/02/skos/core#> SELECT DISTINCT ?IdDoc WHERE { ?IdCon skos:prefLabel "NameConc"@en. ?IdDoc skos:subject ?IdCon. }I thought the query with the URI was faster than the one with the prefLabel, but it seems to me that the performances are identical (sometimes the ones with the prefLabel are faster, also for more difficult queries). Is there any rational explanation?
Thank you for patience! --- Ven 27/8/10, Atanas Kiryakov <n...@sirma.bg> ha scritto:
Da: Atanas Kiryakov <n...@sirma.bg> Oggetto: Re: [Owlim-discussion] Evaluation of query time A: "Ivan Peikov" <ivan.pei...@ontotext.com>, owlim-discussion@ontotext.com Data: Venerdì 27 agosto 2010, 12:56 Hi I am tempted to make few extra comments > 1) What do you mean by "not yet built"? Are querying the repository while > statements are being inserted? If there is an ongoing process of statement > loading into repository this definitely will slow down all queries. > > 2) This is normal behavior. After the first several queries lots of data will > be cached into memory which allows for faster access. When we assess the query > performance we usually make several runs to warm-up the repository and then do > the actual measuring. I consider this type of measurement closest to the real > life usage. This would be a fair explanation for BigOWLIM. In SwiftOWLIM the indices are kept in memory, so, query evaluation, on its own, should happen with the same speed, because there is no caching, performed by the engine. Possible explanations are: - faster fetching, based on cached literals and URIs - as far as I remember literals are not kept in memory, so, when those should be returned as a result, they need to be loaded from the storage - some caching phenomena could appear in such cases - CPU caching - it could be that some important pieces of data are cached in the L3 cache of the CPU, wich is way faster than the regular RAM > 3) When using the SKOS vocabulary it is not uncommon to generate a huge amount > of broaderTransitive/narrowerTransitive relations which then make queries > harder. This is because the linear growth of the explicit broader relations > generates quadratic number of broaderTransitive ones. You can check what is > the case with you but my guess is that a big part of those 10M statements are > actually broaderTransitives (which might be a reason for query performance > degradation). This depends on the average depth of the broader/narrower hierarchy - if the hierarchy is very deep, it can really increase the size of the repository. Still, most likely this is not the problem. Think of the extreme example where 1k concepts are interlinked with a chain of broaderTransitive ... This would lead to inference of 1M new statements, because of the quadratic dependency referred by Ivan. Still, 1M new statements should not be such a pain Another possible explanation of the slowdown with the larger dataset is the fetching time. Fetching 26K results can take some time, some caches of the URI/Literal-to-ID dictionaty can get exhausted and a much slower retrieval from the disc could be required What is most likely is that the incread of the dataset changes the complexity of the query evaluation. This could be for a wide range of non-magical reasons. Is the 10M dataset just a mater of linear scaling of the 4M one or it changes the "proportions" of the dataset? E.g. the new data are all from a specific sortor make some new patterns. Note that SwiftOWLIM performs no query optimisations, so, the evaluation plan follows the order of the constraints from thr query. If the number of bindings of ?IdDescendant in the forst patterns increases several times, when the query is evaluated against the larger dataset, than the entire query evaluation can get slower Cheers Naso ---------------------------------------------------------- Atanas Kiryakov Executive Director of Ontotext AD, http://www.ontotext.com Sirma Group, http://www.sirma.bg Phone: (+359 2) 974 61 44; Fax: 975 3226 ---------------------------------------------------------- There is no mental process that can change the laws of nature or erase facts. The function of consciousness is not to create reality, but to apprehend it. "Existence is Identity, Consciousness is Identification." Ayn Rand > > > Cheers, > Ivan > > On Friday 27 August 2010 11:54:49 Buddy Rich wrote: >> Hi, I am assessing the performance of SwiftOWLIM and I have three >> questions: >> >> 1) It seems to me that the query time when the repository is not yet >> "built" is slower than when the repository is ready. Is it an accident? >> >> 2) It seems to me that if I repeat the same kind of query for very similar >> nodes during the same execution, the performances improve a lot (e.g. from >> 400 ms to 100 ms). In other words, How should I carry out the assessment? >> One query at a time or I can execute more queries at the same time? >> >> 3) I have a query like this: >> >> prefix skos: <http://www.w3.org/2004/02/skos/core#> >> >> SELECT Distinct ?IdRelDoc >> WHERE >> { >> ?IdDescendant skos:broaderTransitive <http://www.ex.it/concept#207>. >> ?idDescendant skos:related ?IdRelConcept. >> ?IdRelDoc skos:subject ?IdRelConcept. >> } >> >> Is it plausible that from an ontology with ~4M statements to one with ~10M, >> the performances increase from 234 ms (for 10k results) to 70k ms (for 26k >> results)? >> >> Thank you in advance! >> >> >> >> >> _______________________________________________ >> OWLIM-discussion mailing list >> OWLIM-discussion@ontotext.com >> http://ontotext.com/mailman/listinfo/owlim-discussion >> > _______________________________________________ > OWLIM-discussion mailing list > OWLIM-discussion@ontotext.com > http://ontotext.com/mailman/listinfo/owlim-discussion _______________________________________________ OWLIM-discussion mailing list OWLIM-discussion@ontotext.com http://ontotext.com/mailman/listinfo/owlim-discussion
_______________________________________________ OWLIM-discussion mailing list OWLIM-discussion@ontotext.comhttp://ontotext.com/mailman/listinfo/owlim-discussion
_______________________________________________ OWLIM-discussion mailing list OWLIM-discussion@ontotext.com http://ontotext.com/mailman/listinfo/owlim-discussion