Re: [Dbpedia-discussion] Distance in DBpedia

Piero Molino Fri, 20 Mar 2009 08:25:15 -0700

Il giorno 20/mar/09, alle ore 09:08, Jens Lehmann ha scritto:

>
> Hello,
>
> Piero Molino wrote:
>> Hi everyone,
>> i trying to find a way to calculate the distance between two resurces
>> withing dbpedia. I thought i could make a sareis of joins in sparql  
>> like
>> this
>>
>> SELECT  ?1 ?2 ?3 WHERE {
>> {<http://dbpedia.org/resource/Category:Historical_board_games> ?p1 ? 
>> 1.}
>> UNION
>> {?1 ?p2 <http://dbpedia.org/resource/ 
>> Category:Historical_board_games>.}
>>
>> {?1 ?p3 ?2.}
>> UNION
>> {?2 ?p4 ?1.}
>>
>> {?3 ?p5 ?2.}
>> UNION
>> {?2 ?p6 ?3.}
>>
>> {<http://dbpedia.org/resource/Economics> ?p7 ?3.}
>> UNION
>> {?3 ?p8 <http://dbpedia.org/resource/Economics>.}
>> }
>>
>> until i found at least one way between the resources, but as i  
>> thought
>> this kind of approach is really too heavy to compute and starting  
>> from 3
>> steps or more it will result in a timeout from the server executing  
>> the
>> sparql query.
>
> This looks like it should work, but performing UNION and JOINS
> efficiently is probably very hard for Virtuoso in this case. Using  
> LIMIT
> 1 and maybe a filter !isLiteral(?something) may give you better  
> performance.


As the objective is to find if there's at least one way of such a  
number of steps, the idea of limiting to 1 is perfect! Now i will try  
it and find to witch distance it has still acceptable performances.

>
>
>> So i was reading this article:
>> Discovering Unknown Connections – the DBpedia Relationship Finder
>> <http://www.informatik.uni-leipzig.de/~auer/publication/ 
>> relfinder.pdf>
>>
>> where are described two algorithms: the first one is a clustering one
>> witch divides the whole rdf triples in clusters of conncted sub- 
>> graphs
>> and assigns a distance value from a single random resource to all the
>> others. The other one does more or less what i'm trying to do,
>> calculating the ways that connect two nodes (also calculating the
>> minimum distance and the maximum distance as absolute value of the
>> difference and the sum of the relative distances to the central
>> resource), but there's this instruction:
>> "formulate SQL query for obtaining at most (n 􀀀 m) connections  
>> between O1
>> and O2 of length d without objects and properties in the ignore  
>> list;"
>> so it is similar to my original idea.... i really would have liked to
>> see how it is implemented in an efficent way, so i downloaded the  
>> code
>> but i was unable to run it bacause of a db problem (it requies a
>> statement table and i donno where it cames from) and still can't find
>> where this implementation is within the sourcecode.
>
> You already found out that the implementation is in the DBpedia
> subversion in the /relfinder directory. One reason why the queries are
> more efficient than the one above is that we use a precomuted table
> storing all triples "in both directions", i.e. for each triple S-P-O  
> we
> also store O-P-S. This way we do not need the union you mentioned  
> above
> and MySQL can better optimise the joins on the statements table. We
> obviously also filter all cases where S or O are literals and remove
> some properties we do not want when computing the connection between  
> two
> objects.

Yes, i wrote this email when i had a first approach to the code, now i  
spant some hours figuring out how it worked, so now it's really  
clearer, thanks for your explaination.

>
>
>> By the way, have someone tried to solve this problem? are there any  
>> kind
>> of suggestion?
>>
>> I thought another possiblity could be the sequent: starting form the
>> clsuterized rdf triples in the article, construct two trees with  
>> the 2
>> resoruces we want to find the distance as roots. Then each son is a
>> connected resource with a distance from the central resource fo the
>> cluster wich is less than root's distance from the center. At the n- 
>> th
>> level (where n is the distance from the central resource of the root
>> resource) we will have the central resource for sure. Then for each
>> resource in the first tree we search for it in the second one. Then  
>> we
>> save every matching reasource in a list with the sum of the distances
>> from the roots in the two trees. Once every resource is checked, the
>> resurce in the list with the smaller value is the resource that
>> minimizes the distance between the two nodes. The worst situation is
>> that there are no common resources in the trees other than the  
>> central
>> resource.
>>
>> Maybe this works but it's a bit elaborate and probably hard to
>> compute... maybe there's a simplier way. Any suggestion?
>
> Getting all nodes with smaller distance from the root node and  
> connected
> to a specific node is computationally not a problem. What is likely to
> be a problem is the number of nodes you have in those two trees, which
> may grow very fast. Another problem is that you won't have a guarantee
> to get the distance this way: Say you have a complete graph with three
> vertices a, b, root. If you ask for the distance between a and b using
> your method it would return 2 instead of one. (There are also other
> counterexamples.) The reason is that going towards the randomly chosen
> root node of a component does not necessarily mean that you are  
> walking
> on a shortest path.

What you say is surely correct, i wouldn't find if the nodes have  
distance 1 with the 2 trees i have described, but maybe working on the  
main idea there could find a working solution (maybe adding also the  
nodes at the same distance of the root? bythe way as i am short of  
timei think i'lluse what is granted to work)

>
>
> Kind regards,
>
> Jens

Thankyou for the clear answers and the help,
Piero

>
>
> -- 
> Dipl. Inf. Jens Lehmann
> Department of Computer Science, University of Leipzig
> Homepage: http://www.jens-lehmann.org
> GPG Key: http://jens-lehmann.org/jens_lehmann.asc
>
>
> ------------------------------------------------------------------------------
> Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM)  
> are
> powering Web 2.0 with engaging, cross-platform capabilities. Quickly  
> and
> easily build your RIAs with Flex Builder, the Eclipse(TM)based  
> development
> software that enables intelligent coding and step-through debugging.
> Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
> _______________________________________________
> Dbpedia-discussion mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion




------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Distance in DBpedia

Reply via email to