http://research.google.com/pubs/DistributedSystemsandParallelComputing.html

On Thu, Jun 10, 2010 at 1:51 AM, Yuval Feinstein <yuv...@answers.com> wrote:
> Most of the implementation of Google's search index is kept secret by Google.
> Based on publicly available information, the indexes are quite different -
> Google uses its BigTable and MapReduce technologies to efficiently distribute 
> the index.
> There are similar efforts in the Lucene ecosystem - Solr Cloud is an advanced 
> one,
> Which is currently in development.
> As Google's scoring algorithm uses hundreds of signals, I guess they store 
> data pertinent to these signals in the index.
> Lucene's index holds relatively few pieces of information about every 
> document (posting lists, term vectors,
> Sometimes norms and payloads).
> I believe there are other differences as well,
> But one could only guess what they are...
> Cheers,
> Yuval
>
>
> -----Original Message-----
> From: luocanrao [mailto:luocan19826...@sohu.com]
> Sent: Wednesday, June 09, 2010 5:18 PM
> To: java-user@lucene.apache.org
> Subject: A question bout google search index?
>
> A news bout google search index. Index system of Lucene can also support
> realtime search,
>
> Is there some difference between them?
>
>
>
> With Caffeine, we analyze the web in small portions and update our search
> index on a continuous basis, globally. As we find new pages, or new
> information on existing pages, we can add these straight to the index. That
> means you can find fresher information than ever before-no matter when or
> where it was published.
>
>
>
> Caffeine lets us index web pages on an enormous scale. In fact, every second
> Caffeine processes hundreds of thousands of pages in parallel. If this were
> a pile of paper it would grow three miles taller every second. Caffeine
> takes up nearly 100 million gigabytes of storage in one database and adds
> new information at a rate of hundreds of thousands of gigabytes per day. You
> would need 625,000 of the largest iPods to store that much information; if
> these were stacked end-to-end they would go for more than 40 miles
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>



-- 
Lance Norskog
goks...@gmail.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to