Olivier Grisel wrote:
> 2010/5/6 Kingsley Idehen <[email protected]>:
>   
>> Olivier Grisel wrote:
>>     
>>> 2010/5/6 Kingsley Idehen <[email protected]>:
>>>
>>>       
>>>> Olivier Grisel wrote:
>>>>
>>>>         
>>>>> 2010/5/5 Paul Houle <[email protected]>:
>>>>>
>>>>>
>>>>>           
>>>>>> Just some thoughts here:
>>>>>>
>>>>>> (1) Sometimes page links get repeated.  I think this is just because
>>>>>> page A has N links to page B.  This doesn't have much semantic impact,
>>>>>> but it does bulk up the files a bit (though less w/ bz2) and makes more
>>>>>> work for my importer script
>>>>>>
>>>>>>
>>>>>>             
>>>>> That can be important to some extent when computing the PageRank of
>>>>> the wikipedia graph. Or other graph algorithms to mesure the proximity
>>>>> / relatedness of entities.
>>>>>
>>>>> BTW, that would be great if the DBpedia project could compute and
>>>>> distribute the PageRank or the TunkRank [1] values for the DBpedia
>>>>> resources based on the data of the page links graph. This is a really
>>>>> good scoring heuristic when performing fuzzy text named queries with
>>>>> several homonymic matches.
>>>>>
>>>>> [1] http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/
>>>>>
>>>>>
>>>>>           
>>>> Olivier,
>>>>
>>>> Have you looked at this interface to DBpedia: http://dbpedia.org/fct ?
>>>> Also
>>>> look at the Entity Rank details in the "About" section.
>>>>
>>>> Basically, you have two ranking schemes in place:
>>>>
>>>> 1. Entity Rank -- based on Link Coefficients
>>>> 2. Text Scores
>>>>
>>>> With Virtuoso you can use both or either to order you SPARQL query
>>>> results.
>>>> This has been so for quite some time now.
>>>>
>>>>         
>>> Yes sure I know that virtuoso is able to do it. It would still be
>>> interesting to have that info in raw N-TRIPLES exports in the download
>>> section of DBpedia for the millions of dbpedia entities for offline
>>> batch processing out of any triple store.
>>>
>>> It should not be a big deal to implement PageRank with Hadoop and Pig
>>> but having it computed once and usable by anybody would still be
>>> useful IMHO.
>>>
>>>
>>>       
>> What vocabulary would drive the production of such an N-TRIPLES dump? I
>> haven't looked at VoiD for a while, but it might be the place for doing
>> something like this.
>>     
>
> One could use a dedicated property such as
> http://dbpedia.org/property/wikirank to state that it is a popularity
> rank between entities based on the PageRank algortihtm applied to the
> dbpedia pagelinks + redirect graph for instance.
>
>   

A possibility.

-- 

Regards,

Kingsley Idehen       
President & CEO 
OpenLink Software     
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen 






------------------------------------------------------------------------------
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to