Re: [Virtuoso-users] Randomness in a SPARQL results

Marc-Alexandre Nolin Wed, 03 Feb 2010 21:04:59 +0000

Hello, is there a fundamental difference between these 2 queries? The
first one take 10 seconds to get an answer, the second one take almost
6 hours to answer. My repository have the following indexes in it:


Index Name      Unique          Type    Columns          New Index
RDF_QUAD_GOPS   No      Other   G, O, P, S      Drop
RDF_QUAD_GPOS   No      Other   G, P, O, S      Drop
RDF_QUAD_OGPS   No      Other   O, G, P, S      Drop
RDF_QUAD_OPSG   No      Other   O, P, S, G      Drop
RDF_QUAD_POSG   No      Other   P, O, S, G      Drop
RDF_QUAD_PSOG   No      Other   P, S, O, G      Drop
RDF_QUAD_SOPG   No      Other   S, O, P, G      Drop

First query: In this one we select object of type affymetrix#Probeset
then we filter those objects on a characteriscs to obtain a subset
sparql
select
?aLabel count(?a)
where
{
?probeset rdf:type <http://bio2rdf.org/ns/affymetrix#Probeset> .
?probeset <http://bio2rdf.org/ns/affymetrix#abs_Score> ?score .
filter(?score > 0.27151) .
?probeset affymetrix:P1 ?b .
?c ncbi:P2 ?b .
?d ?predicate ?c .
?d rdf:type ncbi:Record .
?e ?predicate1 ?d .
?e rdf:type hhpid:P3> .
?e hhpid:P4 ?f .
?e hhpid:P5 ?g .
{
?f bio2rdf:P6 ?a .
}
UNION
{
?g bio2rdf:P6 ?a .
}
?a rdfs:label ?aLabel .
}
order by desc (count(?a))
;

Second query: The same query has the first, but instead of having a
filtered set on a property of affymetrix#Probeset, its a random amount
of these objects. The sub-select query execute in seconds if I ask it
alone, so I don't think it's the problem.
sparql
select
?aLabel count(?a)
where
{
   {
    select distinct ?probeset
    where {
       ?probeset rdf:type <http://bio2rdf.org/ns/affymetrix#Probeset> .
    }
    order by asc (bif:rnd(21306203276, ?probeset)) limit 1666
   }
?probeset affymetrix:P1 ?b .
?c ncbi:P2 ?b .
?d ?predicate ?c .
?d rdf:type ncbi:Record .
?e ?predicate1 ?d .
?e rdf:type hhpid:P3> .
?e hhpid:P4 ?f .
?e hhpid:P5 ?g .
{
?f bio2rdf:P6 ?a .
}
UNION
{
?g bio2rdf:P6 ?a .
}
?a rdfs:label ?aLabel .
}
order by desc (count(?a))
;

Thanks for your help,

Marc-Alexandre Nolin

2010/1/21 Ivan Mikhailov <[email protected]>:
> Hello Marc-Alexandre,
>
> The proper trick is
>
> select distinct ?s
> where {
>  ?s rdf:type <http://biology.com/Protein> .
>  }
> order by order by asc (bif:rnd(2000000000, ?s))
> limit 100
>
> but this is a costly thing because it should find all proteins during
> the run. If you need random sampling on a regular basis, like choosing
> approx. 1/10000 of database on a random-decimation style then it may be
> convenient to extend the database with some "random-index" property, one
> triple per subject, and filter by it.
>
> Best Regards,
>
> Ivan Mikhailov
> OpenLink Software
> http://virtuoso.openlinksw.com
>
>
>
> ------------------------------------------------------------------------------
> Throughout its 18-year history, RSA Conference consistently attracts the
> world's best and brightest in the field, creating opportunities for Conference
> attendees to learn about information security's most important issues through
> interactions with peers, luminaries and emerging and established companies.
> http://p.sf.net/sfu/rsaconf-dev2dev
> _______________________________________________
> Virtuoso-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/virtuoso-users
>

Re: [Virtuoso-users] Randomness in a SPARQL results

Reply via email to