I'm pretty sure that what you describe is the case, specially taking into consideration that PageRank (what drives their search results) is a per document value that is probably recomputed after some long time interval. I did see a MapReduce algorithm to compute PageRank as well. However I do think they must be distributing the query load across many many machines.
I also think that limiting flat results of the top 10 and then do paging is optimized for performance. Yet another reason why Google has not implemented facets browsing or real-time clustering around their result set. J.D. On Feb 6, 2008 4:22 PM, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > (trimming excessive cc-s) > > Ning Li wrote: > > No. I'm curious too. :) > > > > On Feb 6, 2008 11:44 AM, J. Delgado <[EMAIL PROTECTED]> wrote: > > > >> I assume that Google also has distributed index over their > >> GFS/MapReduce implementation. Any idea how they achieve this? > > I'm pretty sure that MapReduce/GFS/BigTable is used only for creating > the index (as well as crawling, data mining, web graph analysis, static > scoring etc). The overhead of MR jobs is just too high. > > Their impressive search response times are most likely the result of > extensive caching of pre-computed partial hit lists for frequent terms > and phrases - at least that's what I suspect after reading this paper > (not by Google folks, but very enlightening): > http://citeseer.ist.psu.edu/724464.html > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >