Hello,

Piero Molino wrote:
> Ok no one has a solution for my problem ^_^ I hope that Jens will  
> answer this time becuase he is one of the authors of the article i  
> cited in the previous message, so he knows best.
> 
> I managed to find the functions i need in the relfinder sourcecode  
> (they were in the index.php) and i realized how the database query are  
> done, and as i thought they were practically a series of joins. By the  
> way i can't get things working because o the statements table: can  
> someone who tried this come tell me how to construct it? May i use the  
> dbpedia csv dumps and import them in a mysql table like this:
> 
> (
>    `subject` varchar(255) collate latin1_general_ci NOT NULL,
>    `predicate` varchar(255) collate latin1_general_ci NOT NULL,
>    `object` varchar(255) collate latin1_general_ci NOT NULL,
>    `id` int(10) unsigned NOT NULL,
>    PRIMARY KEY  (`id`)
> )
> 
> ? (this is the code of the CopyTable from the relfinder sourcecode)

At the time we wrote the Relationship Finder, the statements table was
easy to create. You just had to download the csv file of the DBpedia
release and load it into your database. Now things have changed a bit
since then. You have to perform slight modifications of the extraction
code. I prepared a csv file for you here:
http://downloads.dbpedia.org/tmp/infobox.csv.bz2

In a next step, you have to load the data into your DB, which can be
done using e.g. this PHP script on the command line:

<?
$connection = mysql_connect('localhost',$user,$pass,true);
mysql_select_db('dbpedia_relfinder', $connection) or die(mysql_error());

mysql_query("DROP TABLE IF EXISTS statements") or die(mysql_error());

mysql_query("CREATE TABLE `statements` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `subject` varchar(255) collate latin1_general_ci NOT NULL default '',
  `predicate` varchar(255) collate latin1_general_ci NOT NULL default '',
  `object` text collate latin1_general_ci,
  `object_is` char(1) collate latin1_general_ci NOT NULL default '',
  PRIMARY KEY  (`id`),
  KEY `s_sub_pred_idx` (`subject`(200)),
  KEY `s_pred_idx` (`predicate`(200)),
  KEY `s_obj_idx` (`object`(250))
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;"
) or die(mysql_error());


mysql_query("LOAD DATA LOCAL INFILE 'infobox.csv' IGNORE INTO TABLE
statements") or die(mysql_error());
?>

The second step is computing the components of the RDF graph. To do
this, you have to execute cluster_main.php on the commandline. This can
take hours or even days depending on your machine.

Regarding the queries, you are right that they are basically joins. We
can very easily detect whether two resources are in the same component
of the graph and - as you read in the paper - we can also efficiently
give a minimum and maximum value for the distance between two resources.
The hard part is to detect the exact distance. Using MySQL, we found
that joins performs quite reasonable if the distance is below 8 (if I
remember correctly).

In the meantime I have also seen other (relatively recent) approaches to
compute the distance between resources, which are particularly targeted
at large graphs. However, I do not have a handy literature reference
available.

Currently, we are thinking about reviving the DBpedia Relationship
Finder and are looking at ways to provide this tool without the involved
maintenance overhead of keeping it up-to-date. This means that we will
probably use SPARQL queries against Virtuoso. This approach works well
for distances up to 3/4. (You can try SPARQL queries against the
official DBpedia endpoint to test this.)

Kind regards,

Jens


-- 
Dipl. Inf. Jens Lehmann
Department of Computer Science, University of Leipzig
Homepage: http://www.jens-lehmann.org
GPG Key: http://jens-lehmann.org/jens_lehmann.asc


------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to