Re: [Virtuoso-users] Randomness in a SPARQL results

Ivan Mikhailov Fri, 22 Jan 2010 03:30:07 +0000

Hello Marc-Alexandre,

> select go count(?go)
> where
> {
>    {select distinct ?protein
>    where {
>    ?protein rdf:type <http://biology.com/Protein> .
>    }
>    order by order by asc (bif:rnd(2000000000, ?protein))
>    limit 100 } .
>    ?protein <http://biology.com/xGO> ?go .
> }
> order by count(?go)
>


> I would still like too found the bif:rnd specification.

It's argument is the range of possible results, a value of N means
"random integers from zero to N-1".
Other arguments are ignored by function entirely but the SQL optimizer
do not know it so the query calls the function for every ?protein.
bif:rnd(2000000000) would be calculated only once at the very beginning
of the query, resulting in absolutely inefficient "order by constant".

To randomize, use randomize(seed).

A subtle problem exists with functions that uses rnd() for writing
randomized data to tables of the database. If transaction log is
disabled by log_enable() function and the function call is logged by the
application instead of actual rows changed by transactions AND the
server has died before the next checkpoint has happened THEN the
transaction log replay will run the function call with different value
of randomization seed, resulting in different content of tables and
unpredicatable result of replaying the rest of the log. In order to
avoid that, randomize() should be used before using the procedure with
rnd() and the call of randomize() with same argument should be logged
before the call of procedure with rnd() and the application should
prevent the run of few such functions in parallel (say, by using
atomic() function). Or, alternatively, the application should not
disable the transaction log. Or, alternatively, the application should
run atomic and make checkpoint as soon as randomized data are prepared.

Best Regards,

Ivan Mikhailov
OpenLink Software
http://virtuoso.openlinksw.com

Re: [Virtuoso-users] Randomness in a SPARQL results

Reply via email to