Re: Lucene-based Distributed Index Leveraging Hadoop

J. Delgado Wed, 06 Feb 2008 16:43:47 -0800

I'm pretty sure that what you describe is the case, specially taking into
consideration that PageRank (what drives their search results) is a per
document value that is probably recomputed after some long time interval. I
did see a MapReduce algorithm to compute PageRank as well. However I do
think they must be distributing the query load across many many machines.


I also think that limiting flat results of the top 10 and then do paging is
optimized for performance. Yet another reason why Google has not implemented
facets browsing or real-time clustering around their result set.

J.D.

On Feb 6, 2008 4:22 PM, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:

> (trimming excessive cc-s)
>
> Ning Li wrote:
> > No. I'm curious too. :)
> >
> > On Feb 6, 2008 11:44 AM, J. Delgado <[EMAIL PROTECTED]> wrote:
> >
> >> I assume that Google also has distributed index over their
> >> GFS/MapReduce implementation. Any idea how they achieve this?
>
> I'm pretty sure that MapReduce/GFS/BigTable is used only for creating
> the index (as well as crawling, data mining, web graph analysis, static
> scoring etc). The overhead of MR jobs is just too high.
>
> Their impressive search response times are most likely the result of
> extensive caching of pre-computed partial hit lists for frequent terms
> and phrases - at least that's what I suspect after reading this paper
> (not by Google folks, but very enlightening):
> http://citeseer.ist.psu.edu/724464.html
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Re: Lucene-based Distributed Index Leveraging Hadoop

Reply via email to