Re: [d2rq-dev] D2RQ bad performance in Jena

Anastasiya Goncharova Wed, 03 Apr 2013 02:11:30 -0700

I think, I need to provide query that I run.

SPARQl query is:


PREFIX vocab: <http://www.vocab.de/vocab/> SELECT ?x ?y ?z { ?id
vocab:port39390_facts_relation ?z. ?id vocab:port39390_facts_arg1 ?x. ?id
vocab:port39390_facts_arg2 ?y.  FILTER (?z = 'isLeaderOf')}

D2RQ rewrites this query into 2 SQL (as it is written to console):

11:46:01 INFO  SQLIterator          :: SELECT DISTINCT
"T2_port39390_facts"."arg2", "T1_port39390_facts"."id",
"T3_port39390_facts"."arg1" FROM "port39390"."facts" AS
"T1_port39390_facts", "port39390"."facts" AS "T2_port39390_facts",
"port39390"."facts" AS "T3_port39390_facts" WHERE
("T1_port39390_facts"."id" = "T2_port39390_facts"."id" AND
"T1_port39390_facts"."relation" = 'isLeaderOf' AND
"T1_port39390_facts"."relation" IS NOT NULL AND "T2_port39390_facts"."arg2"
IS NOT NULL AND "T2_port39390_facts"."id" = "T3_port39390_facts"."id" AND
"T3_port39390_facts"."arg1" IS NOT NULL)

11:46:01 INFO  SQLIterator          :: SELECT DISTINCT
"T1_port39390_facts"."id", "T4_port39390_facts"."arg2",
"T2_port39390_facts"."relation", "T3_port39390_facts"."arg1" FROM
"port39390"."facts" AS "T1_port39390_facts", "port39390"."facts" AS
"T4_port39390_facts", "port39390"."facts" AS "T2_port39390_facts",
"port39390"."facts" AS "T3_port39390_facts" WHERE
("T1_port39390_facts"."id" = "T4_port39390_facts"."id" AND
"T2_port39390_facts"."id" = "T4_port39390_facts"."id" AND
"T2_port39390_facts"."relation" IS NOT NULL AND "T3_port39390_facts"."arg1"
IS NOT NULL AND "T3_port39390_facts"."id" = "T4_port39390_facts"."id" AND
"T4_port39390_facts"."arg2" IS NOT NULL)

So, second query considers 4 copies of the table and for 3 copies does a
full scan on one of their columns (arg1, arg2, relation) and for 4th copy a
full scan on all columns. Is it possible to prevent such behaviour and what
the difference between query evaluation from Jena and in command line using
d2r-query?



2013/4/2 Anastasiya Goncharova <slmn...@gmail.com>

> Hello everyone,
>
> I have a large dataset that contains about 650 millions rows. I try to
> evaluate query that returns about 8000 rows. When I run this query from
> command line using d2r-query function, the result is returned fast enough.
> But when I evaluate the same query from Jena using
>
> ResultSet rs = QueryExecutionFactory.create(query, RDFModel).execSelect();
>
> it takes too long. I was waiting for several hours and then have
> terminated application without waiting the end of evaluation. Why does it
> happen and how to improve a runtime?
>
> Best,
>
> Anastasiya
>

------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html

_______________________________________________
d2rq-map-devel mailing list
d2rq-map-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/d2rq-map-devel

Re: [d2rq-dev] D2RQ bad performance in Jena

Reply via email to