http://research.google.com/pubs/DistributedSystemsandParallelComputing.html
On Thu, Jun 10, 2010 at 1:51 AM, Yuval Feinstein <yuv...@answers.com> wrote: > Most of the implementation of Google's search index is kept secret by Google. > Based on publicly available information, the indexes are quite different - > Google uses its BigTable and MapReduce technologies to efficiently distribute > the index. > There are similar efforts in the Lucene ecosystem - Solr Cloud is an advanced > one, > Which is currently in development. > As Google's scoring algorithm uses hundreds of signals, I guess they store > data pertinent to these signals in the index. > Lucene's index holds relatively few pieces of information about every > document (posting lists, term vectors, > Sometimes norms and payloads). > I believe there are other differences as well, > But one could only guess what they are... > Cheers, > Yuval > > > -----Original Message----- > From: luocanrao [mailto:luocan19826...@sohu.com] > Sent: Wednesday, June 09, 2010 5:18 PM > To: java-user@lucene.apache.org > Subject: A question bout google search index? > > A news bout google search index. Index system of Lucene can also support > realtime search, > > Is there some difference between them? > > > > With Caffeine, we analyze the web in small portions and update our search > index on a continuous basis, globally. As we find new pages, or new > information on existing pages, we can add these straight to the index. That > means you can find fresher information than ever before-no matter when or > where it was published. > > > > Caffeine lets us index web pages on an enormous scale. In fact, every second > Caffeine processes hundreds of thousands of pages in parallel. If this were > a pile of paper it would grow three miles taller every second. Caffeine > takes up nearly 100 million gigabytes of storage in one database and adds > new information at a rate of hundreds of thousands of gigabytes per day. You > would need 625,000 of the largest iPods to store that much information; if > these were stacked end-to-end they would go for more than 40 miles > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Lance Norskog goks...@gmail.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org