Re: [Nepomuk] Virtuoso Problems - nao:userVisible

Gaël Beaudoin Wed, 22 Aug 2012 01:22:57 -0700

Le 22/08/2012 07:46, Vishesh Handa a écrit :

Hey everyone
In 4.9, most the queries on large datasets are impossibly slow andoften cause virtuoso to completely lock up. So I've been going throughthe common queries that are passed to Nepomuk from a user perspectiveand been trying to optimize them.
The most prevalent problem is that of the user visibility.
Simple queries like listing all the tags seem to blow out ofproportion with the added " FILTER EXISTS { ?r a [ nao:userVisible"true"^^xsd:boolean ] . }". If one looks the the SQL that is beinggenerated one can see a drastic different
"select ?r where { ?r a nao:Tag . }"

SELECT __id2i ( "s_1_0-t0"."S" ) AS "r"
FROM DB.DBA.RDF_QUAD AS "s_1_0-t0"
WHERE "s_1_0-t0"."P" = __i2idn ( __bft('http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
  AND  isiri_id ( "s_1_0-t0"."O")
AND "s_1_0-t0"."O" = __i2idn ( __bft('http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1))
OPTION (QUIETCAST)
"select ?r where { ?r a nao:Tag . FILTER EXISTS { ?r a [nao:userVisible "true"^^xsd:boolean ] . } }"
SELECT __id2i ( "s_1_0-t0"."S" ) AS "r"
FROM DB.DBA.RDF_QUAD AS "s_1_0-t0"
WHERE "s_1_0-t0"."P" = __i2idn ( __bft('http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
  AND  isiri_id ( "s_1_0-t0"."O")
AND "s_1_0-t0"."O" = __i2idn ( __bft('http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1))
  AND  EXISTS ( (
     SELECT TOP 1 1 AS __ask_retval
      FROM DB.DBA.RDF_QUAD AS "s_1_4-t1"
        INNER JOIN DB.DBA.RDF_QUAD AS "s_1_4-t2"
        ON ( "s_1_4-t1"."S"  = "s_1_4-t2"."O" )
WHERE "s_1_4-t1"."P" = __i2idn ( __bft('http://www.semanticdesktop.org/ontologies/2007/08/15/nao#userVisible', 1))
        AND  (1 - isiri_id ( "s_1_4-t1"."O"))
        AND  "s_1_4-t1"."O" = DB.DBA.RDF_OBJ_OF_SQLVAL ( 1)
AND "s_1_4-t2"."P" = __i2idn ( __bft('http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
        AND  isiri_id ( "s_1_4-t2"."O")
        AND  "s_1_4-t2"."S"  = "s_1_0-t0"."S"
OPTION (QUIETCAST)
     ))
OPTION (QUIETCAST)
The second query results in an added query on every single result, andthat additional query also contains an added join.
On my system with 13k tags (yeah, I know), the system is completelyunusable. Virtuoso pops up to 200% and takes about 5 minutes torespond. While I don't expect anyone to have 13k tags, people do havethose many contacts or emails.
Options on how to fix -

1. Use graphs with a filter -
select ?r where { graph ?g { ?r a nao:Tag . } FILTER NOT EXISTS { ?g anrl:Ontology. } }
_______________________________________________________________________________

SELECT __id2i ( "s_1_1-t0"."S" ) AS "r"
FROM DB.DBA.RDF_QUAD AS "s_1_1-t0"
WHERE "s_1_1-t0"."P" = __i2idn ( __bft('http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
  AND  isiri_id ( "s_1_1-t0"."O")
AND "s_1_1-t0"."O" = __i2idn ( __bft('http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1))
  AND  not ( EXISTS ( (
     SELECT TOP 1 1 AS __ask_retval
      FROM DB.DBA.RDF_QUAD AS "s_1_4-t1"
WHERE "s_1_4-t1"."P" = __i2idn ( __bft('http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
        AND  isiri_id ( "s_1_4-t1"."O")
AND "s_1_4-t1"."O" = __i2idn ( __bft('http://www.semanticdesktop.org/ontologies/2007/08/15/nrl#Ontology' , 1))
        AND  "s_1_4-t1"."S"  = "s_1_1-t0"."G"
OPTION (QUIETCAST)
     )))
OPTION (QUIETCAST)
This also results in an additional SQL query per resource, but it'sstill a LOT faster (no join in the exists query).
2.) Use graphs via nao:maintainedBy
select ?r where { graph ?g { ?r a nao:Tag . } ?g nao:maintainedBy ?app. }'
_______________________________________________________________________________

SELECT __id2i ( "s_1_1-t0"."S" ) AS "r"
FROM DB.DBA.RDF_QUAD AS "s_1_1-t0"
  INNER JOIN DB.DBA.RDF_QUAD AS "s_1_0-t1"
  ON ( "s_1_0-t1"."S"  = "s_1_1-t0"."G" )
WHERE "s_1_1-t0"."P" = __i2idn ( __bft('http://www.w3.org/1999/02/22-rdf-syntax-ns#type' , 1))
  AND  isiri_id ( "s_1_1-t0"."O")
AND "s_1_1-t0"."O" = __i2idn ( __bft('http://www.semanticdesktop.org/ontologies/2007/08/15/nao#Tag' , 1))
  AND  ( "s_1_0-t1"."S" < min_bnode_iri_id ())
AND "s_1_0-t1"."P" = __i2idn ( __bft('http://www.semanticdesktop.org/ontologies/2007/08/15/nao#maintainedBy' ,1))
OPTION (QUIETCAST)
This would be the ideal solution, however it will kill backwardcompatibility cause all the graph don't have the nao:maintainedBy clause.
3.) Go SQL and add another column to our RDF_QUAD table which isindexed. That way we can always filter statements on the basis ofvisibility. Would be considerably faster than the join.
I suggest we go with option 1 for 4.9, and option 2 for 4.10 and getrid of all the user visible stuff.
Any suggestions?

--
Vishesh Handa



_______________________________________________
Nepomuk mailing list
[email protected]
https://mail.kde.org/mailman/listinfo/nepomuk

Why not try 3 ? Looks like a simple and obvious solution from my pointof view. I'm not a sparql guy, but used to dealing with sql anddatabases. I understand it's not the more elegant solution, but it willscale much more and fast is never fast enough.


My 2 cents :)
Gaël

_______________________________________________
Nepomuk mailing list
[email protected]
https://mail.kde.org/mailman/listinfo/nepomuk

Re: [Nepomuk] Virtuoso Problems - nao:userVisible

Reply via email to