Re: Parallel SPARQL queries with ARQ?

Rinne Mikko Fri, 28 Oct 2011 03:31:37 -0700

Hi Dave!

This was exactly the answer I was looking for - thank you very much! I'll dive 
into JenaRules and see, how accurately the current SPARQL-queries would 
translate.


I only saw your mail after sending mine; sorry for the extra traffic on the 
list.

Mikko

On 28. Oct 2011, at 1:14 PM, Dave Reynolds wrote:

> Hi,
> 
> On Fri, 2011-10-28 at 08:16 +0000, Rinne Mikko wrote: 
>> Hi!
>> 
>> New to Jena and the list, please bear with me if this has been explained 
>> over and over again. I had so far no luck with the documentation, mailing 
>> list archives or googling, so here we go:
>> 
>> Can ARQ be used to execute multiple parallel SPARQL queries?
>> 
>> I would like to configure e.g. 100 or 1000 queries and then run them against 
>> a single file of triples. I wrote a piece of code to run the queries in 
>> sequence and got suprisingly good performance with brute force, but I would 
>> expect going through the dataset only once to perform much better.
>> 
>> If ARQ doesn't support this, is it the Jena forward-chaining RETE 
>> engine<http://jena.sourceforge.net/inference/> I should be looking at, and 
>> translate the SPARQL queries manually?
> 
> Like Paolo I'm not quite sure what you are trying to do but based on
> that question let me take a guess ...
> 
> It sounds like you have your data, maybe in a memory model, and want to
> run a *lot* of queries over that single data set. You suspect that
> instead of each query starting over again maybe you could stream the
> data once through some sort of query sieve to do all the queries at once
> in one pass. Is that about right?
> 
> If so then there is no specific parallel-SPARQL-query support in Jena
> but as you say it might be possible to use the RETE engine depending on
> the specifics of what you are doing.
> 
> As an aside, note it is possible issue SPARQL queries in parallel (most
> of the Jena stores are Multiple Reader Single Writer), so on a
> multi-core machine you might get extra speed from the brute force query
> approach by spreading the queries across a small number of threads.
> 
> The RETE engine works by keeping tables of partially matched triple
> patterns so that each new triple is matched against the rules
> incrementally. Which does seem related to what you want.
> 
> The problem is that JenaRules is not SPARQL - there's no equivalents of
> SPARQL constructs like UNION, ORDER BY, DISTINCT etc and the set of
> built in predicates for filtering is different. Furthermore all you can
> do as a result of a rule matching is assert a set of triples as a result
> (or call some java code like Print) - you don't have access to a stream
> of binding results in the way you do with SPARQL.
> 
> However, if your queries are primarily just basic graph patterns and if
> the results from your queries can be expressed as new triples then you
> could indeed use the RETE engine. 
> 
> Whether it will gain you any benefit depends on the specifics. If there
> is a lot of shared patterns between your rules then it might. If not
> the overheads of the rule machinery may outweigh the gain from reuse of
> partial matches.
> 
> I would suggest you try a small experiment first to measure the
> cost/gains before committing to it.
> 
>> Ultimately I would like to track the processing of each new triple from the 
>> dataset, in case it matches a query.
> 
> That is the way the RETE engine works. So long as you are only adding
> and not removing triples (and so long as you don't have any nasty
> non-monotonic operators in your rules) then each triple added to the
> model is filtered through the RETE network to see if it triggers more
> rules.
> 
>> Any proposals on good documentation?
> 
> The primary documentation for the rules engine is:
> 
> http://incubator.apache.org/jena/documentation/inference/index.html#rules
> 
> Dave
> 
>

Re: Parallel SPARQL queries with ARQ?

Reply via email to