That's big, and while I have not created such large indices with Lucene, I would think that disk I/O would be the biggest issue. That is why Nutch has distributed search options built in, and their demo has 'only' 100M documents. Perhaps you can mimic distributed indexing and searching approach of Nutch.
Otis --- Will Allen <[EMAIL PROTECTED]> wrote: > Hi, > I am considering a project that would index 315+ million documents. > I am comfortable that the indexing will work well in creating an > index ~800GB in size, but am concerned about the query performance. > (Is this a = bad > assumption?) > > What are the bottlenecks of performance as an index scales? Memory? > = Cost is not a concern, so what would be the shortcomings of a > theoretical = machine with 16GB of ram, 4-16 cpus and 1-2 terabytes > of space? Would it be = better to cluster machines to break apart > the query? > > Thank you for your serious responses, > Will Allen > -- > ___________________________________________________________ > Sign-up for Ads Free at Mail.com > http://promo.mail.com/adsfreejump.htm > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
