Leon, Index is typically a directory on disk with files (commonly called "index files") in it. Each index can have 1 or more segments. Each segment is comprised of several index files.
If you are using the compound index format, then the situation is a bit different (less index files). Otis P.S. You asked about Lucene in Action... :) ----- Original Message ---- From: Chris Hostetter <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, February 15, 2006 1:40:01 PM Subject: Re: Size + memory restrictions : We may have many different segments of our index, and it seems below we are : using one : IndexSearcher per segment. Could this explain why we run out of memory when : using more than 2/3 segments? : Anyone else have any comments on the below? terminology is a big issue hwere .. when you use the word "segment" it seems like you are talking about a segment of your data, which is in a self contained index in it's own right. My point in in the comment you quoted was that for a given index, you don't need more then one active IndexSearcher open at a time, any more then that can waste resources. I don't know what kind of memory overhead there is in a MultiSearcher, but besides that you should also be looking at the other issues in the message you quoted from: who/when is calling your getSearcher() method? ... is it getting called more often then the underlying indexes change? who is closing the old searchers when you open new ones? : : Many thanks : : Leon : ps. At the moment I think it is set to only look at 2 segements : : private Searcher getSearcher() throws IOException { : if (mSearcher == null) { : synchronized (Monitor) { : Searcher[] srs = new IndexSearcher[SearchersDir.size()]; : int maxI = 2; : // Searcher[] srs = new IndexSearcher[maxI]; : int i = 0; : for (Iterator iter = SearchersDir.iterator(); iter.hasNext() && i<maxI; : i++) { : String dir = (String) iter.next(); : try { : srs[i] = new IndexSearcher(IndexDir+dir); : } catch (IOException e) { : log.error(ClassTool.getClassNameOnly(e) + ": " + e.getMessage(), e); : } : } : mSearcher = new MultiSearcher(srs); : changeTime = System.currentTimeMillis(); : } : } : return mSearcher; : } : ----- Original Message ----- : From: "Leon Chaddock" <[EMAIL PROTECTED]> : To: <java-user@lucene.apache.org> : Sent: Wednesday, February 15, 2006 9:28 AM : Subject: Re: Size + memory restrictions : : : > Hi Greg, : > Thanks. We are actually running against 4 segments of 4gb so about 20 : > million docs. We cant merge the segments as their seems to be problems : > with out linux box , with having files over about 4gb. Not sure why that : > is. : > : > If I was to upgrade to 8gb of ram does it seem likely this will double the : > amount of docs we can handle, or would this provide an exponential : > increase? : > : > Thanks : > : > Leon : > ----- Original Message ----- : > From: "Greg Gershman" <[EMAIL PROTECTED]> : > To: <java-user@lucene.apache.org> : > Sent: Wednesday, February 15, 2006 12:41 AM : > Subject: Re: Size + memory restrictions : > : > : >> You may consider incrementally adding documents to : >> your index; I'm not sure why there would be problems : >> adding to an existing index, but you can always add : >> additional documents. You can optimize later to get : >> everything back into a single segment. : >> : >> Querying is a different story; if you are using the : >> Sort API, you will need enough memory to store a full : >> sorting of your documents in memory. If you're trying : >> to sort on a string or anything other than an int or : >> float, this could require a lot of memory. : >> : >> I've used indices much bigger than 5 mil. docs/3.5 gb : >> with less than 4GB of RAM and had no problems. : >> : >> Greg : >> : >> : >> --- Leon Chaddock <[EMAIL PROTECTED]> wrote: : >> : >>> Hi, : >>> we are having tremendous problems building a large : >>> lucene index and querying : >>> it. : >>> : >>> The programmers are telling me that when the index : >>> file reaches 3.5 gb or 5 : >>> million docs the index file can no longer grow any : >>> larger. : >>> : >>> To rectify this they have built index files in : >>> multiple directories. Now : >>> apparently my 4gb memory is not enough to query. : >>> : >>> Does this seem right to people or does anyone have : >>> any experience on largish : >>> scale projects. : >>> : >>> I am completely tearing my hair out here and dont : >>> know what to do. : >>> : >>> Thanks : >>> : >> : >> : >> __________________________________________________ : >> Do You Yahoo!? : >> Tired of spam? Yahoo! Mail has the best spam protection around : >> http://mail.yahoo.com : >> : >> --------------------------------------------------------------------- : >> To unsubscribe, e-mail: [EMAIL PROTECTED] : >> For additional commands, e-mail: [EMAIL PROTECTED] : >> : >> : >> : >> : >> : >> -- : >> Internal Virus Database is out-of-date. : >> Checked by AVG Free Edition. : >> Version: 7.1.375 / Virus Database: 267.15.0/248 - Release Date: : >> 01/02/2006 : >> : >> : > : > : > --------------------------------------------------------------------- : > To unsubscribe, e-mail: [EMAIL PROTECTED] : > For additional commands, e-mail: [EMAIL PROTECTED] : > : > : > : > : > : > -- : > Internal Virus Database is out-of-date. : > Checked by AVG Free Edition. : > Version: 7.1.375 / Virus Database: 267.15.0/248 - Release Date: 01/02/2006 : > : > : : : --------------------------------------------------------------------- : To unsubscribe, e-mail: [EMAIL PROTECTED] : For additional commands, e-mail: [EMAIL PROTECTED] : -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]