Thanks, Jim, your arithmetic is compelling. The queries are ad hoc, of course, but an average query might easily search on 10-15 fields. We avoid wildcards except in a few special cases, because they can be very slow. Example: to search for first name beginning with M -- firstName:M* -- takes well over 40 seconds on my index.
Given these numbers, what is your experience of hardware necessary to get decent performance out of large, highly interactive Lucene indexes? I'm thinking of Alex McManus's recent post "Making a case for Lucene" where he posits an index over 100M documents, a place where we plan to be too. It seems that all my issues are amenable to a hardware solution. I guess I just don't know enough to make a recommendation to my people as to exactly what kind of iron we need :( -- Mark -----Original Message----- From: James Dunn [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 29, 2004 07:29 pm To: Lucene Users List Subject: RE: Running OutOfMemory while optimizing and searching Mark, What do your queries look like? The memory required for a query can be computed by the following equation: 1 Byte * Number of fields in your query * Number of docs in your index So if your query searches on all 50 fields of your 3.5 Million document index then each search would take about 175MB. If your 3-4 searches run concurrently then that's about 525MB to 700MB chewed up at once. Also, if your queries use wildcards, the memory requirements could be much greater. Hope that helps, Jim --- Mark Florence <[EMAIL PROTECTED]> wrote: > Otis, Thanks for considering this problem. > > I'm using all the default parameters -- and still > scratching my head! > > I can't see any that would control memory usage. > Plus, a 2GB heap is > quite big. I see others have indexes bigger than > mine, so I'm not sure > how to find out why mine is throwing OutOfMemory -- > not only on the > optimize, but when 3-4 searchers are running, too. > > -- Mark > > -----Original Message----- > From: Otis Gospodnetic > [mailto:[EMAIL PROTECTED] > Sent: Tuesday, June 29, 2004 01:02 am > To: Lucene Users List > Subject: Re: Running OutOfMemory while optimizing > and searching > > > Mark, > > Tough situation. I hate when things like this > happen on production :(. > You are not mentioning what you are using for > various IndexWriter > parameters. You may be able to get this working by > tweaking them (see > http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexWrite > r.html#field_summary). > Hm, now that I think about it, I am not sure if > those are considered > during index optimization. I'll try checking the > sources later. > > Otis > > --- Mark Florence <[EMAIL PROTECTED]> wrote: > > Hi, I'm using Lucene to index ~3.5M documents, > over about 50 fields. > > The Lucene > > index itself is ~10.5GB, spread over ~7,000 files. > Some of these > > files are > > "large" -- that is, several PRX files are ~1.5GB. > > > > Lucene runs on a dedicated server (Linux on a 1Ghz > Dell, with 1GB > > RAM). Clients > > on other machines use RMI to perform reads / > writes. Each night the > > server > > automatically performs an optimize. > > > > The problem is that the optimize now dies with an > OutOfMemory > > exception, even > > when the JVM heap size is set to its maximum of > 2GB. I need to > > optimize, because > > as the number of Lucene files grows, search > performance becomes > > unacceptable. > > > > Search performance is also adversely affected > because I've had to > > effectively > > single-thread reads and writes. I was using a > simple read / write > > lock > > mechanism, allowing multiple readers to > simultaneously search, but > > now more than > > 3-4 simultaneous readers will also cause an > OutOfMemory condition. > > Searches can > > take as long as 30-40 seconds, and with > single-threading, that's > > crippling the > > main client application. > > > > Needless to say, the Lucene index is > mission-critical, and must run > > 24/7. > > > > I've seen other posts along this same vein, but no > definite > > consensus. Is my > > problem simply inadequate hardware? Should I run > on a 64-bit > > platform, where I > > can allocate a Java heap of > 2GB? > > > > Or could there be something fundamentally "wrong" > with my index? I > > should add > > that I've just spent about a week (!!) rebuilding > from scratch, over > > all 3.5M > > documents. > > > > -- Many thanks for any help! Mark Florence > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
