On Fri, 7 May 2004, Will Allen wrote: > Hi, > I am considering a project that would index 315+ million > documents. I am comfortable that the indexing will work well in creating > an index ~800GB in size, but am concerned about the query performance. > (Is this a = bad assumption?)
How fast do you need to return a response from a search? The largest index that I've created has over 200M documents and is about 125GB in size. The app has fairly low performance requirements and was done with pretty minimal hardware... > What are the bottlenecks of performance as an index scales? Memory? Yeah, I find that a 2GB heap size can be a bit tight with an index that size. 16GB sounds about right, but make sure your JVM can use it. > Cost is not a concern, so what would be the shortcomings of a > theoretical machine with 16GB of ram, 4-16 cpus and 1-2 terabytes of > space? Would it be better to cluster machines to break apart the > query? Assuming your budget can afford it and your design can utilize all those cpus effectively, I think you'd be worried about the underlying disk subsystem and how fast you can read the blocks you need from the index. Use the smallest 10k rpm (or 15k rpm) drives you can so your subsystem isn't spindle bound, multiple fibre HBAs and consider breaking apart that massive index into smaller sub-indexes. Vince @work @home vince.taluskie (at) cexp.com vince (at) taluskie.com Corporate Express; Technical Architect Westminster, CO Phone: 303 664 2660 http://www.taluskie.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
