Re: [Dbpedia-discussion] Distance in DBpedia

Jens Lehmann Fri, 20 Mar 2009 01:12:45 -0700

Hello,

Piero Molino wrote:
> Hi everyone,
> i trying to find a way to calculate the distance between two resurces
> withing dbpedia. I thought i could make a sareis of joins in sparql like
> this
> 
> SELECT  ?1 ?2 ?3 WHERE {
> {<http://dbpedia.org/resource/Category:Historical_board_games> ?p1 ?1.}
> UNION
> {?1 ?p2 <http://dbpedia.org/resource/Category:Historical_board_games>.}
> 
> {?1 ?p3 ?2.}
> UNION
> {?2 ?p4 ?1.}
> 
> {?3 ?p5 ?2.}
> UNION
> {?2 ?p6 ?3.}
> 
> {<http://dbpedia.org/resource/Economics> ?p7 ?3.}
> UNION
> {?3 ?p8 <http://dbpedia.org/resource/Economics>.}
> }
> 
> until i found at least one way between the resources, but as i thought
> this kind of approach is really too heavy to compute and starting from 3
> steps or more it will result in a timeout from the server executing the
> sparql query.


This looks like it should work, but performing UNION and JOINS
efficiently is probably very hard for Virtuoso in this case. Using LIMIT
1 and maybe a filter !isLiteral(?something) may give you better performance.

> So i was reading this article:
> Discovering Unknown Connections – the DBpedia Relationship Finder
> <http://www.informatik.uni-leipzig.de/~auer/publication/relfinder.pdf>
> 
> where are described two algorithms: the first one is a clustering one
> witch divides the whole rdf triples in clusters of conncted sub-graphs
> and assigns a distance value from a single random resource to all the
> others. The other one does more or less what i'm trying to do,
> calculating the ways that connect two nodes (also calculating the
> minimum distance and the maximum distance as absolute value of the
> difference and the sum of the relative distances to the central
> resource), but there's this instruction:
> "formulate SQL query for obtaining at most (n 􀀀 m) connections between O1
> and O2 of length d without objects and properties in the ignore list;"
> so it is similar to my original idea.... i really would have liked to
> see how it is implemented in an efficent way, so i downloaded the code
> but i was unable to run it bacause of a db problem (it requies a
> statement table and i donno where it cames from) and still can't find
> where this implementation is within the sourcecode.

You already found out that the implementation is in the DBpedia
subversion in the /relfinder directory. One reason why the queries are
more efficient than the one above is that we use a precomuted table
storing all triples "in both directions", i.e. for each triple S-P-O we
also store O-P-S. This way we do not need the union you mentioned above
and MySQL can better optimise the joins on the statements table. We
obviously also filter all cases where S or O are literals and remove
some properties we do not want when computing the connection between two
objects.

> By the way, have someone tried to solve this problem? are there any kind
> of suggestion?
> 
> I thought another possiblity could be the sequent: starting form the
> clsuterized rdf triples in the article, construct two trees with the 2
> resoruces we want to find the distance as roots. Then each son is a
> connected resource with a distance from the central resource fo the
> cluster wich is less than root's distance from the center. At the n-th
> level (where n is the distance from the central resource of the root
> resource) we will have the central resource for sure. Then for each
> resource in the first tree we search for it in the second one. Then we
> save every matching reasource in a list with the sum of the distances
> from the roots in the two trees. Once every resource is checked, the
> resurce in the list with the smaller value is the resource that
> minimizes the distance between the two nodes. The worst situation is
> that there are no common resources in the trees other than the central
> resource.
> 
> Maybe this works but it's a bit elaborate and probably hard to
> compute... maybe there's a simplier way. Any suggestion?

Getting all nodes with smaller distance from the root node and connected
to a specific node is computationally not a problem. What is likely to
be a problem is the number of nodes you have in those two trees, which
may grow very fast. Another problem is that you won't have a guarantee
to get the distance this way: Say you have a complete graph with three
vertices a, b, root. If you ask for the distance between a and b using
your method it would return 2 instead of one. (There are also other
counterexamples.) The reason is that going towards the randomly chosen
root node of a component does not necessarily mean that you are walking
on a shortest path.

Kind regards,

Jens

-- 
Dipl. Inf. Jens Lehmann
Department of Computer Science, University of Leipzig
Homepage: http://www.jens-lehmann.org
GPG Key: http://jens-lehmann.org/jens_lehmann.asc


------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Distance in DBpedia

Reply via email to