I know (or at least suspect) that Google has a distributed way of computing a singular value decompositions for large matrices (i.e for the term-document matrix). I think the same technique for dimension reduction can be applied to approximate some eigenvalues of sparse matrices (the link matrix, for instance), if I'm not mistaken - I'm getting kind of rusty on the LSA mathematics these days. This would eliminate the need for infusing fake links.
Anywho, the PageRank algorithm displayed in the original Google paper was said to work on their dataset, which wasn't very big at the time. I'm sure that they have modified the algorithm a lot since it was first published. While on the topic, during the "Bourbon update" earlier this year, rumors were flying around about the "TrustRank" algorithm, which involved some human input on validating credible sources of data on the web. There's a paper from Stanford on that, http://www.vldb.org/conf/2004/RS15P3.PDF , which is a fun read if you're an LSA geek. On 12/16/05, Stefan Groschupf < [EMAIL PROTECTED]> wrote: > > Hi, > found this link on a news site, may some can found this interesting. > "An Israeli mathematician, Hillel Tal-Ezer, of the Academic College > of Tel Aviv in Yaffo has written a paper on the faults of google's > mathematical algorithms for page ranking" > http://www2.mta.ac.il/~hillel/data_mining/faults_of_PageRank.pdf<http://www2.mta.ac.il/%7Ehillel/data_mining/faults_of_PageRank.pdf> > > Cheers, > Stefan >
