2010/5/6 Kingsley Idehen <[email protected]>:
> Olivier Grisel wrote:
>>
>> 2010/5/6 Kingsley Idehen <[email protected]>:
>>
>>>
>>> Olivier Grisel wrote:
>>>
>>>>
>>>> 2010/5/5 Paul Houle <[email protected]>:
>>>>
>>>>
>>>>>
>>>>> Just some thoughts here:
>>>>>
>>>>> (1) Sometimes page links get repeated.  I think this is just because
>>>>> page A has N links to page B.  This doesn't have much semantic impact,
>>>>> but it does bulk up the files a bit (though less w/ bz2) and makes more
>>>>> work for my importer script
>>>>>
>>>>>
>>>>
>>>> That can be important to some extent when computing the PageRank of
>>>> the wikipedia graph. Or other graph algorithms to mesure the proximity
>>>> / relatedness of entities.
>>>>
>>>> BTW, that would be great if the DBpedia project could compute and
>>>> distribute the PageRank or the TunkRank [1] values for the DBpedia
>>>> resources based on the data of the page links graph. This is a really
>>>> good scoring heuristic when performing fuzzy text named queries with
>>>> several homonymic matches.
>>>>
>>>> [1] http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/
>>>>
>>>>
>>>
>>> Olivier,
>>>
>>> Have you looked at this interface to DBpedia: http://dbpedia.org/fct ?
>>> Also
>>> look at the Entity Rank details in the "About" section.
>>>
>>> Basically, you have two ranking schemes in place:
>>>
>>> 1. Entity Rank -- based on Link Coefficients
>>> 2. Text Scores
>>>
>>> With Virtuoso you can use both or either to order you SPARQL query
>>> results.
>>> This has been so for quite some time now.
>>>
>>
>>
>> Yes sure I know that virtuoso is able to do it. It would still be
>> interesting to have that info in raw N-TRIPLES exports in the download
>> section of DBpedia for the millions of dbpedia entities for offline
>> batch processing out of any triple store.
>>
>> It should not be a big deal to implement PageRank with Hadoop and Pig
>> but having it computed once and usable by anybody would still be
>> useful IMHO.
>>
>>
>
> What vocabulary would drive the production of such an N-TRIPLES dump? I
> haven't looked at VoiD for a while, but it might be the place for doing
> something like this.

One could use a dedicated property such as
http://dbpedia.org/property/wikirank to state that it is a popularity
rank between entities based on the PageRank algortihtm applied to the
dbpedia pagelinks + redirect graph for instance.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to