I know (or at least suspect) that Google has a distributed way of computing
a singular value decompositions for large matrices (i.e for the
term-document matrix). I think the same technique for dimension reduction
can be applied to approximate some eigenvalues of sparse matrices (the link
matrix, for instance), if I'm not mistaken - I'm getting kind of rusty on
the LSA mathematics these days. This would eliminate the need for infusing
fake links.

Anywho, the PageRank algorithm displayed in the original Google paper was
said to work on their dataset, which wasn't very big at the time. I'm sure
that they have modified the algorithm a lot since it was first published.

While on the topic, during the "Bourbon update" earlier this year, rumors
were flying around about the "TrustRank" algorithm, which involved some
human input on validating credible sources of data on the web. There's a
paper from Stanford on that, http://www.vldb.org/conf/2004/RS15P3.PDF ,
which is a fun read if you're an LSA geek.

On 12/16/05, Stefan Groschupf < [EMAIL PROTECTED]> wrote:
>
> Hi,
> found this link on a news site, may some can found this interesting.
> "An Israeli mathematician, Hillel Tal-Ezer, of the Academic College
> of Tel Aviv in Yaffo has written a paper on the faults of google's
> mathematical algorithms for page ranking"
> http://www2.mta.ac.il/~hillel/data_mining/faults_of_PageRank.pdf<http://www2.mta.ac.il/%7Ehillel/data_mining/faults_of_PageRank.pdf>
>
> Cheers,
> Stefan
>

Reply via email to