[Dbpedia-discussion] Fwd: Distance in DBpedia

Piero Molino Fri, 20 Mar 2009 08:25:28 -0700

Il giorno 20/mar/09, alle ore 08:44, Jens Lehmann ha scritto:

>
>
>
> Hello,
>
> Piero Molino wrote:
>> Ok no one has a solution for my problem ^_^ I hope that Jens will
>> answer this time becuase he is one of the authors of the article i
>> cited in the previous message, so he knows best.
>>
>> I managed to find the functions i need in the relfinder sourcecode
>> (they were in the index.php) and i realized how the database query  
>> are
>> done, and as i thought they were practically a series of joins. By  
>> the
>> way i can't get things working because o the statements table: can
>> someone who tried this come tell me how to construct it? May i use  
>> the
>> dbpedia csv dumps and import them in a mysql table like this:
>>
>> (
>>  `subject` varchar(255) collate latin1_general_ci NOT NULL,
>>  `predicate` varchar(255) collate latin1_general_ci NOT NULL,
>>  `object` varchar(255) collate latin1_general_ci NOT NULL,
>>  `id` int(10) unsigned NOT NULL,
>>  PRIMARY KEY  (`id`)
>> )
>>
>> ? (this is the code of the CopyTable from the relfinder sourcecode)
>
> At the time we wrote the Relationship Finder, the statements table was
> easy to create. You just had to download the csv file of the DBpedia
> release and load it into your database. Now things have changed a bit
> since then. You have to perform slight modifications of the extraction
> code. I prepared a csv file for you here:
> http://downloads.dbpedia.org/tmp/infobox.csv.bz2


That's fantastic! thank you really much, i really appreciate your help!

>
>
> In a next step, you have to load the data into your DB, which can be
> done using e.g. this PHP script on the command line:
>
> <?
> $connection = mysql_connect('localhost',$user,$pass,true);
> mysql_select_db('dbpedia_relfinder', $connection) or  
> die(mysql_error());
>
> mysql_query("DROP TABLE IF EXISTS statements") or die(mysql_error());
>
> mysql_query("CREATE TABLE `statements` (
> `id` int(10) unsigned NOT NULL auto_increment,
> `subject` varchar(255) collate latin1_general_ci NOT NULL default '',
> `predicate` varchar(255) collate latin1_general_ci NOT NULL default  
> '',
> `object` text collate latin1_general_ci,
> `object_is` char(1) collate latin1_general_ci NOT NULL default '',
> PRIMARY KEY  (`id`),
> KEY `s_sub_pred_idx` (`subject`(200)),
> KEY `s_pred_idx` (`predicate`(200)),
> KEY `s_obj_idx` (`object`(250))
> ) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;"
> ) or die(mysql_error());
>
>
> mysql_query("LOAD DATA LOCAL INFILE 'infobox.csv' IGNORE INTO TABLE
> statements") or die(mysql_error());
> ?>
>
> The second step is computing the components of the RDF graph. To do
> this, you have to execute cluster_main.php on the commandline. This  
> can
> take hours or even days depending on your machine.

Ok i'll try it later today so that i can leave the computer calculate  
and enjoying himself :)

>
>
> Regarding the queries, you are right that they are basically joins. We
> can very easily detect whether two resources are in the same component
> of the graph and - as you read in the paper - we can also efficiently
> give a minimum and maximum value for the distance between two  
> resources.

Yes that is clear from the paper and is a really clever idea.

>
> The hard part is to detect the exact distance. Using MySQL, we found
> that joins performs quite reasonable if the distance is below 8 (if I
> remember correctly).

In my work i have to calculate a distance factor within a [0...1]  
range. In the article there's a graphic that show the distace  
distribution, but it is clearly told that it has to interpreted  
cautiously because it is taken from a randomly a randomly selected  
node. There's also reference to future work for a comprehensive  
analysis. Have you already published the results of this analysis?  
They would be etremely useful for my work and i will surely cite them  
in the bibliography. If no, can you suggest me a maximum distance  
indicator so i can get values in [0...1] by dividing the obtained  
distance?

>
>
> In the meantime I have also seen other (relatively recent)  
> approaches to
> compute the distance between resources, which are particularly  
> targeted
> at large graphs. However, I do not have a handy literature reference
> available.

If it happens that ou find sme literature about it, please tell me :)

>
>
> Currently, we are thinking about reviving the DBpedia Relationship
> Finder and are looking at ways to provide this tool without the  
> involved
> maintenance overhead of keeping it up-to-date. This means that we will
> probably use SPARQL queries against Virtuoso. This approach works well
> for distances up to 3/4. (You can try SPARQL queries against the
> official DBpedia endpoint to test this.)

Yes that's the same reason why i tried to find a virtuoso query to do  
it (in the previous mail) so it would have worked on local virtuoso  
servers with dbpedia dump but also on the main dbpedia sparql  
endpoint. As i told i will try those queries to find the maximum  
distance that makes them resolve in an acceptable time, and then  
decide what to implement.

>
>
> Kind regards,
>
> Jens

Thankyou Jens, you probably don't know how helpful have you been. I'm  
really grateful. Once i finish m work i will release it under GPL so i  
hope that it would also be interesting for you as your work is beeing  
useful to me.
Piero

>
>
>
> -- 
> Dipl. Inf. Jens Lehmann
> Department of Computer Science, University of Leipzig
> Homepage: http://www.jens-lehmann.org
> GPG Key: http://jens-lehmann.org/jens_lehmann.asc
>



------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

[Dbpedia-discussion] Fwd: Distance in DBpedia

Reply via email to