Buddy Rich
Sat, 28 Aug 2010 11:48:19 -0700
Thank you very much, Ivan and Atanas! Just for clarity: - The 10M dataset does not change the "proportions" of the dataset. -When I say that the repository is not yet built, I mean that the "repositories" directory is not yet created (e.g. it is the first execution of "GettingStarted" for the ontology).
Now, if it is possible, I would like to show three observations/doubts:
1)In the "owlim.ttl" file I choose these options:
owlim:ruleset "owl-max" ;
owlim:partialRDFS "true" ;
owlim:noPersist "true" ;
owlim:entity-index-size "200000" ;
owlim:jobsize "200" ;
Maybe, should I change these options, expecially index and job size (or the
performances should be the same)?
2)Could you suggest me any references about the "quadratic dependency" with the
broaderTransitive? Just for curiosity!
3)I repeated the same kind of query (something like the one I showed you in my
first post) both with the concept URI and the prefLabel.
Something like this:
[query1]
prefix skos: <http://www.w3.org/2004/02/skos/core#>
SELECT DISTINCT ?IdDoc
WHERE {
?IdDoc skos:subject <http://www.ex.com/concept#103>.
}
and
[query2]
prefix skos: <http://www.w3.org/2004/02/skos/core#>
SELECT DISTINCT ?IdDoc
WHERE {
?IdCon skos:prefLabel "NameConc"@en.
?IdDoc skos:subject ?IdCon.
}
I thought the query with the URI was faster than the one with the prefLabel,
but it seems to me that the performances are identical (sometimes the ones with
the prefLabel are faster, also for more difficult queries). Is there any
rational explanation?
Thank you for patience!
--- Ven 27/8/10, Atanas Kiryakov <n...@sirma.bg> ha scritto:
> Da: Atanas Kiryakov <n...@sirma.bg>
> Oggetto: Re: [Owlim-discussion] Evaluation of query time
> A: "Ivan Peikov" <ivan.pei...@ontotext.com>, owlim-discussion@ontotext.com
> Data: Venerdì 27 agosto 2010, 12:56
> Hi
>
> I am tempted to make few extra comments
>
> > 1) What do you mean by "not yet built"? Are querying
> the repository while
> > statements are being inserted? If there is an ongoing
> process of statement
> > loading into repository this definitely will slow down
> all queries.
> >
> > 2) This is normal behavior. After the first several
> queries lots of data will
> > be cached into memory which allows for faster access.
> When we assess the query
> > performance we usually make several runs to warm-up
> the repository and then do
> > the actual measuring. I consider this type of
> measurement closest to the real
> > life usage.
>
> This would be a fair explanation for BigOWLIM.
> In SwiftOWLIM the indices are kept in memory, so, query
> evaluation, on its own, should happen with the same speed,
> because there is no caching, performed by the engine.
> Possible explanations are:
> - faster fetching, based on cached literals and URIs - as
> far as I remember literals are not kept in memory, so, when
> those should be returned as a result, they need to be loaded
> from the storage - some caching phenomena could appear in
> such cases
> - CPU caching - it could be that some important pieces of
> data are cached in the L3 cache of the CPU, wich is way
> faster than the regular RAM
>
> > 3) When using the SKOS vocabulary it is not uncommon
> to generate a huge amount
> > of broaderTransitive/narrowerTransitive relations
> which then make queries
> > harder. This is because the linear growth of the
> explicit broader relations
> > generates quadratic number of broaderTransitive ones.
> You can check what is
> > the case with you but my guess is that a big part of
> those 10M statements are
> > actually broaderTransitives (which might be a reason
> for query performance
> > degradation).
>
> This depends on the average depth of the broader/narrower
> hierarchy - if the hierarchy is very deep, it can really
> increase the size of the repository. Still, most likely this
> is not the problem. Think of the extreme example where 1k
> concepts are interlinked with a chain of broaderTransitive
> ... This would lead to inference of 1M new statements,
> because of the quadratic dependency referred by Ivan. Still,
> 1M new statements should not be such a pain
>
> Another possible explanation of the slowdown with the
> larger dataset is the fetching time. Fetching 26K results
> can take some time, some caches of the URI/Literal-to-ID
> dictionaty can get exhausted and a much slower retrieval
> from the disc could be required
>
> What is most likely is that the incread of the dataset
> changes the complexity of the query evaluation. This could
> be for a wide range of non-magical reasons. Is the 10M
> dataset just a mater of linear scaling of the 4M one or it
> changes the "proportions" of the dataset? E.g. the new data
> are all from a specific sortor make some new patterns.
>
> Note that SwiftOWLIM performs no query optimisations, so,
> the evaluation plan follows the order of the constraints
> from thr query. If the number of bindings of ?IdDescendant
> in the forst patterns increases several times, when the
> query is evaluated against the larger dataset, than the
> entire query evaluation can get slower
>
> Cheers
> Naso
>
> ----------------------------------------------------------
> Atanas Kiryakov
> Executive Director of Ontotext AD, http://www.ontotext.com
> Sirma Group, http://www.sirma.bg
> Phone: (+359 2) 974 61 44; Fax: 975 3226
> ----------------------------------------------------------
> There is no mental process that can change the laws of
> nature or erase facts.
> The function of consciousness is not to create reality, but
> to apprehend it.
> "Existence is Identity, Consciousness is Identification."
> Ayn Rand
>
> >
> >
> > Cheers,
> > Ivan
> >
> > On Friday 27 August 2010 11:54:49 Buddy Rich wrote:
> >> Hi, I am assessing the performance of SwiftOWLIM
> and I have three
> >> questions:
> >>
> >> 1) It seems to me that the query time when the
> repository is not yet
> >> "built" is slower than when the repository
> is ready. Is it an accident?
> >>
> >> 2) It seems to me that if I repeat the same kind
> of query for very similar
> >> nodes during the same execution, the
> performances improve a lot (e.g. from
> >> 400 ms to 100 ms). In other words, How
> should I carry out the assessment?
> >> One query at a time or I can execute more queries
> at the same time?
> >>
> >> 3) I have a query like this:
> >>
> >> prefix skos: <http://www.w3.org/2004/02/skos/core#>
> >>
> >> SELECT Distinct ?IdRelDoc
> >> WHERE
> >> {
> >> ?IdDescendant
> skos:broaderTransitive <http://www.ex.it/concept#207>.
> >> ?idDescendant skos:related
> ?IdRelConcept.
> >> ?IdRelDoc skos:subject
> ?IdRelConcept.
> >> }
> >>
> >> Is it plausible that from an ontology with ~4M
> statements to one with ~10M,
> >> the performances increase from 234 ms (for
> 10k results) to 70k ms (for 26k
> >> results)?
> >>
> >> Thank you in advance!
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> OWLIM-discussion mailing list
> >> OWLIM-discussion@ontotext.com
> >> http://ontotext.com/mailman/listinfo/owlim-discussion
> >>
> > _______________________________________________
> > OWLIM-discussion mailing list
> > OWLIM-discussion@ontotext.com
> > http://ontotext.com/mailman/listinfo/owlim-discussion
>
> _______________________________________________
> OWLIM-discussion mailing list
> OWLIM-discussion@ontotext.com
> http://ontotext.com/mailman/listinfo/owlim-discussion
>
_______________________________________________
OWLIM-discussion mailing list
OWLIM-discussion@ontotext.com
http://ontotext.com/mailman/listinfo/owlim-discussion