Hello, is there a fundamental difference between these 2 queries? The
first one take 10 seconds to get an answer, the second one take almost
6 hours to answer. My repository have the following indexes in it:
Index Name Unique Type Columns New Index
RDF_QUAD_GOPS No Other G, O, P, S Drop
RDF_QUAD_GPOS No Other G, P, O, S Drop
RDF_QUAD_OGPS No Other O, G, P, S Drop
RDF_QUAD_OPSG No Other O, P, S, G Drop
RDF_QUAD_POSG No Other P, O, S, G Drop
RDF_QUAD_PSOG No Other P, S, O, G Drop
RDF_QUAD_SOPG No Other S, O, P, G Drop
First query: In this one we select object of type affymetrix#Probeset
then we filter those objects on a characteriscs to obtain a subset
sparql
select
?aLabel count(?a)
where
{
?probeset rdf:type <http://bio2rdf.org/ns/affymetrix#Probeset> .
?probeset <http://bio2rdf.org/ns/affymetrix#abs_Score> ?score .
filter(?score > 0.27151) .
?probeset affymetrix:P1 ?b .
?c ncbi:P2 ?b .
?d ?predicate ?c .
?d rdf:type ncbi:Record .
?e ?predicate1 ?d .
?e rdf:type hhpid:P3> .
?e hhpid:P4 ?f .
?e hhpid:P5 ?g .
{
?f bio2rdf:P6 ?a .
}
UNION
{
?g bio2rdf:P6 ?a .
}
?a rdfs:label ?aLabel .
}
order by desc (count(?a))
;
Second query: The same query has the first, but instead of having a
filtered set on a property of affymetrix#Probeset, its a random amount
of these objects. The sub-select query execute in seconds if I ask it
alone, so I don't think it's the problem.
sparql
select
?aLabel count(?a)
where
{
{
select distinct ?probeset
where {
?probeset rdf:type <http://bio2rdf.org/ns/affymetrix#Probeset> .
}
order by asc (bif:rnd(21306203276, ?probeset)) limit 1666
}
?probeset affymetrix:P1 ?b .
?c ncbi:P2 ?b .
?d ?predicate ?c .
?d rdf:type ncbi:Record .
?e ?predicate1 ?d .
?e rdf:type hhpid:P3> .
?e hhpid:P4 ?f .
?e hhpid:P5 ?g .
{
?f bio2rdf:P6 ?a .
}
UNION
{
?g bio2rdf:P6 ?a .
}
?a rdfs:label ?aLabel .
}
order by desc (count(?a))
;
Thanks for your help,
Marc-Alexandre Nolin
2010/1/21 Ivan Mikhailov <[email protected]>:
> Hello Marc-Alexandre,
>
> The proper trick is
>
> select distinct ?s
> where {
> ?s rdf:type <http://biology.com/Protein> .
> }
> order by order by asc (bif:rnd(2000000000, ?s))
> limit 100
>
> but this is a costly thing because it should find all proteins during
> the run. If you need random sampling on a regular basis, like choosing
> approx. 1/10000 of database on a random-decimation style then it may be
> convenient to extend the database with some "random-index" property, one
> triple per subject, and filter by it.
>
> Best Regards,
>
> Ivan Mikhailov
> OpenLink Software
> http://virtuoso.openlinksw.com
>
>
>
> ------------------------------------------------------------------------------
> Throughout its 18-year history, RSA Conference consistently attracts the
> world's best and brightest in the field, creating opportunities for Conference
> attendees to learn about information security's most important issues through
> interactions with peers, luminaries and emerging and established companies.
> http://p.sf.net/sfu/rsaconf-dev2dev
> _______________________________________________
> Virtuoso-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/virtuoso-users
>