Olivier Grisel wrote: > 2010/5/6 Kingsley Idehen <[email protected]>: > >> Olivier Grisel wrote: >> >>> 2010/5/5 Paul Houle <[email protected]>: >>> >>> >>>> Just some thoughts here: >>>> >>>> (1) Sometimes page links get repeated. I think this is just because >>>> page A has N links to page B. This doesn't have much semantic impact, >>>> but it does bulk up the files a bit (though less w/ bz2) and makes more >>>> work for my importer script >>>> >>>> >>> That can be important to some extent when computing the PageRank of >>> the wikipedia graph. Or other graph algorithms to mesure the proximity >>> / relatedness of entities. >>> >>> BTW, that would be great if the DBpedia project could compute and >>> distribute the PageRank or the TunkRank [1] values for the DBpedia >>> resources based on the data of the page links graph. This is a really >>> good scoring heuristic when performing fuzzy text named queries with >>> several homonymic matches. >>> >>> [1] http://thenoisychannel.com/2009/01/13/a-twitter-analog-to-pagerank/ >>> >>> >> Olivier, >> >> Have you looked at this interface to DBpedia: http://dbpedia.org/fct ? Also >> look at the Entity Rank details in the "About" section. >> >> Basically, you have two ranking schemes in place: >> >> 1. Entity Rank -- based on Link Coefficients >> 2. Text Scores >> >> With Virtuoso you can use both or either to order you SPARQL query results. >> This has been so for quite some time now. >> > > > Yes sure I know that virtuoso is able to do it. It would still be > interesting to have that info in raw N-TRIPLES exports in the download > section of DBpedia for the millions of dbpedia entities for offline > batch processing out of any triple store. > > It should not be a big deal to implement PageRank with Hadoop and Pig > but having it computed once and usable by anybody would still be > useful IMHO. > >
What vocabulary would drive the production of such an N-TRIPLES dump? I haven't looked at VoiD for a while, but it might be the place for doing something like this. -- Regards, Kingsley Idehen President & CEO OpenLink Software Web: http://www.openlinksw.com Weblog: http://www.openlinksw.com/blog/~kidehen Twitter/Identi.ca: kidehen ------------------------------------------------------------------------------ _______________________________________________ Dbpedia-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
