Olivier Grisel wrote: > 2010/5/6 Kingsley Idehen <[email protected]>: > >> Olivier Grisel wrote: >> >>> 2010/5/6 Kingsley Idehen <[email protected]>: >>> >>> >>>> Olivier Grisel wrote: >>>> >>>> >>>>> 2010/5/5 Paul Houle <[email protected]>: >>>>> >>>>> >>>>> >>>>>> Just some thoughts here: >>>>>> >>>>>> (1) Sometimes page links get repeated. I think this is just because >>>>>> page A has N links to page B. This doesn't have much semantic impact, >>>>>> but it does bulk up the files a bit (though less w/ bz2) and makes more >>>>>> work for my importer script >>>>>> >>>>>> >>>>>> >>>>> That can be important to some extent when computing the PageRank of >>>>> the wikipedia graph. Or other graph algorithms to mesure the proximity >>>>> / relatedness of entities. >>>>> >>>>> BTW, that would be great if the DBpedia project could compute and >>>>> distribute the PageRank or the TunkRank [1] values for the DBpedia >>>>> resources based on the data of the page links graph. This is a really >>>>> good scoring heuristic when performing fuzzy text named queries with >>>>> several homonymic matches. >>>>> >>>>> [1] http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/ >>>>> >>>>> >>>>> >>>> Olivier, >>>> >>>> Have you looked at this interface to DBpedia: http://dbpedia.org/fct ? >>>> Also >>>> look at the Entity Rank details in the "About" section. >>>> >>>> Basically, you have two ranking schemes in place: >>>> >>>> 1. Entity Rank -- based on Link Coefficients >>>> 2. Text Scores >>>> >>>> With Virtuoso you can use both or either to order you SPARQL query >>>> results. >>>> This has been so for quite some time now. >>>> >>>> >>> Yes sure I know that virtuoso is able to do it. It would still be >>> interesting to have that info in raw N-TRIPLES exports in the download >>> section of DBpedia for the millions of dbpedia entities for offline >>> batch processing out of any triple store. >>> >>> It should not be a big deal to implement PageRank with Hadoop and Pig >>> but having it computed once and usable by anybody would still be >>> useful IMHO. >>> >>> >>> >> What vocabulary would drive the production of such an N-TRIPLES dump? I >> haven't looked at VoiD for a while, but it might be the place for doing >> something like this. >> > > One could use a dedicated property such as > http://dbpedia.org/property/wikirank to state that it is a popularity > rank between entities based on the PageRank algortihtm applied to the > dbpedia pagelinks + redirect graph for instance. > >
A possibility. -- Regards, Kingsley Idehen President & CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen ------------------------------------------------------------------------------ _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
