2010/5/6 Kingsley Idehen <[email protected]>: > Olivier Grisel wrote: >> >> 2010/5/6 Kingsley Idehen <[email protected]>: >> >>> >>> Olivier Grisel wrote: >>> >>>> >>>> 2010/5/5 Paul Houle <[email protected]>: >>>> >>>> >>>>> >>>>> Just some thoughts here: >>>>> >>>>> (1) Sometimes page links get repeated. I think this is just because >>>>> page A has N links to page B. This doesn't have much semantic impact, >>>>> but it does bulk up the files a bit (though less w/ bz2) and makes more >>>>> work for my importer script >>>>> >>>>> >>>> >>>> That can be important to some extent when computing the PageRank of >>>> the wikipedia graph. Or other graph algorithms to mesure the proximity >>>> / relatedness of entities. >>>> >>>> BTW, that would be great if the DBpedia project could compute and >>>> distribute the PageRank or the TunkRank [1] values for the DBpedia >>>> resources based on the data of the page links graph. This is a really >>>> good scoring heuristic when performing fuzzy text named queries with >>>> several homonymic matches. >>>> >>>> [1] http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/ >>>> >>>> >>> >>> Olivier, >>> >>> Have you looked at this interface to DBpedia: http://dbpedia.org/fct ? >>> Also >>> look at the Entity Rank details in the "About" section. >>> >>> Basically, you have two ranking schemes in place: >>> >>> 1. Entity Rank -- based on Link Coefficients >>> 2. Text Scores >>> >>> With Virtuoso you can use both or either to order you SPARQL query >>> results. >>> This has been so for quite some time now. >>> >> >> >> Yes sure I know that virtuoso is able to do it. It would still be >> interesting to have that info in raw N-TRIPLES exports in the download >> section of DBpedia for the millions of dbpedia entities for offline >> batch processing out of any triple store. >> >> It should not be a big deal to implement PageRank with Hadoop and Pig >> but having it computed once and usable by anybody would still be >> useful IMHO. >> >> > > What vocabulary would drive the production of such an N-TRIPLES dump? I > haven't looked at VoiD for a while, but it might be the place for doing > something like this.
One could use a dedicated property such as http://dbpedia.org/property/wikirank to state that it is a popularity rank between entities based on the PageRank algortihtm applied to the dbpedia pagelinks + redirect graph for instance. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
