Olivier Grisel wrote:
> 2010/5/5 Paul Houle <[email protected]>:
>   
>> Just some thoughts here:
>>
>> (1) Sometimes page links get repeated.  I think this is just because
>> page A has N links to page B.  This doesn't have much semantic impact,
>> but it does bulk up the files a bit (though less w/ bz2) and makes more
>> work for my importer script
>>     
>
> That can be important to some extent when computing the PageRank of
> the wikipedia graph. Or other graph algorithms to mesure the proximity
> / relatedness of entities.
>
> BTW, that would be great if the DBpedia project could compute and
> distribute the PageRank or the TunkRank [1] values for the DBpedia
> resources based on the data of the page links graph. This is a really
> good scoring heuristic when performing fuzzy text named queries with
> several homonymic matches.
>
> [1] http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/
>   

Olivier,

Have you looked at this interface to DBpedia: http://dbpedia.org/fct ? 
Also look at the Entity Rank details in the "About" section.

Basically, you have two ranking schemes in place:

1. Entity Rank -- based on Link Coefficients
2. Text Scores

With Virtuoso you can use both or either to order you SPARQL query 
results. This has been so for quite some time now.

Links:

1. 
http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtuosoFacetsWebService 
-- Web Service Interface
2. 
http://www.openlinksw.com/dataspace/dav/wiki/Main/VirtuosoURIBurnerSampleTutorial
 
-- Tutorial
3. 
http://www.openlinksw.com/dataspace/dav/wiki/Main/VirtuosoFacetsViewsLinkedData 
-- Faceted Views over Large-Scale Linked Data
4. 
http://virtuoso.openlinksw.com/presentations/SPARQL_Tutorials/SPARQL_Tutorials_Part_2/SPARQL_Tutorials_Part_2.html#%287%29
 
-- Entity Rank & Text Scores examples from SPARQL Tutorial.

Kingsley
>   
>> (3) Might be nice to extract the anchor text together with the link,
>> though then we're not talking about a triple anymore and have to put in
>> some of those dreaded blank nodes...  I've been think about training
>> decision rules for a namexer by capturing the text context that
>> pagelinks occur in,  but I'd have to write my own extractor to do that.
>>     
>
> Or this could be extracted in an adhoc CSV file since I don't really
> see the point in having those in a knowlege base / triple store but
> this is precious data for training machine learning based NLP models.
>
>   



-- 

Regards,

Kingsley Idehen       
President & CEO 
OpenLink Software     
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen 






------------------------------------------------------------------------------
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to