Re: Size + memory restrictions

Otis Gospodnetic Wed, 15 Feb 2006 12:55:01 -0800

Leon,

Index is typically a directory on disk with files (commonly called "index 
files") in it.
Each index can have 1 or more segments.
Each segment is comprised of several index files.


If you are using the compound index format, then the situation is a bit 
different (less index files).

Otis
P.S.
You asked about Lucene in Action... :)

----- Original Message ----
From: Chris Hostetter <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Wednesday, February 15, 2006 1:40:01 PM
Subject: Re: Size + memory restrictions

: We may have many different segments of our index, and it seems below we are
: using one
: IndexSearcher per segment. Could this explain why we run out of memory when
: using more than 2/3 segments?
: Anyone else have any comments on the below?

terminology is a big issue hwere .. when you use the word "segment" it
seems like you are talking about a segment of your data, which is in a
self contained index in it's own right.  My point in in the comment you
quoted was that for a given index, you don't need more then one active
IndexSearcher open at a time, any more then that can waste resources.

I don't know what kind of memory overhead there is in a MultiSearcher, but
besides that you should also be looking at the other issues in the message
you quoted from:   who/when is calling your getSearcher() method? ... is
it getting called more often then the underlying indexes change?  who is
closing the old searchers when you open new ones?

:
: Many thanks
:
: Leon
: ps. At the moment I think it is set to only look at 2 segements
:
: private Searcher getSearcher() throws IOException {
:   if (mSearcher == null) {
:    synchronized (Monitor) {
:     Searcher[] srs = new IndexSearcher[SearchersDir.size()];
:     int maxI = 2;
:    // Searcher[] srs = new IndexSearcher[maxI];
:     int i = 0;
:     for (Iterator iter = SearchersDir.iterator(); iter.hasNext() && i<maxI;
: i++) {
:      String dir = (String) iter.next();
:      try {
:       srs[i] = new IndexSearcher(IndexDir+dir);
:      } catch (IOException e) {
:       log.error(ClassTool.getClassNameOnly(e) + ": " + e.getMessage(), e);
:      }
:     }
:     mSearcher = new MultiSearcher(srs);
:     changeTime = System.currentTimeMillis();
:    }
:   }
:   return mSearcher;
:  }
: ----- Original Message -----
: From: "Leon Chaddock" <[EMAIL PROTECTED]>
: To: <java-user@lucene.apache.org>
: Sent: Wednesday, February 15, 2006 9:28 AM
: Subject: Re: Size + memory restrictions
:
:
: > Hi Greg,
: > Thanks. We are actually running against 4 segments of 4gb so about 20
: > million docs. We cant merge the segments as their seems to be problems
: > with out linux box , with having files over about 4gb. Not sure why that
: > is.
: >
: > If I was to upgrade to 8gb of ram does it seem likely this will double the
: > amount of docs we can handle, or would this provide an exponential
: > increase?
: >
: > Thanks
: >
: > Leon
: > ----- Original Message -----
: > From: "Greg Gershman" <[EMAIL PROTECTED]>
: > To: <java-user@lucene.apache.org>
: > Sent: Wednesday, February 15, 2006 12:41 AM
: > Subject: Re: Size + memory restrictions
: >
: >
: >> You may consider incrementally adding documents to
: >> your index; I'm not sure why there would be problems
: >> adding to an existing index, but you can always add
: >> additional documents.  You can optimize later to get
: >> everything back into a single segment.
: >>
: >> Querying is a different story; if you are using the
: >> Sort API, you will need enough memory to store a full
: >> sorting of your documents in memory.  If you're trying
: >> to sort on a string or anything other than an int or
: >> float, this could require a lot of memory.
: >>
: >> I've used indices much bigger than 5 mil. docs/3.5 gb
: >> with less than 4GB of RAM and had no problems.
: >>
: >> Greg
: >>
: >>
: >> --- Leon Chaddock <[EMAIL PROTECTED]> wrote:
: >>
: >>> Hi,
: >>> we are having tremendous problems building a large
: >>> lucene index and querying
: >>> it.
: >>>
: >>> The programmers are telling me that when the index
: >>> file reaches 3.5 gb or 5
: >>> million docs the index file can no longer grow any
: >>> larger.
: >>>
: >>> To rectify this they have built index files in
: >>> multiple directories. Now
: >>> apparently my 4gb memory is not enough to query.
: >>>
: >>> Does this seem right to people or does anyone have
: >>> any experience on largish
: >>> scale projects.
: >>>
: >>> I am completely tearing my hair out here and dont
: >>> know what to do.
: >>>
: >>> Thanks
: >>>
: >>
: >>
: >> __________________________________________________
: >> Do You Yahoo!?
: >> Tired of spam?  Yahoo! Mail has the best spam protection around
: >> http://mail.yahoo.com
: >>
: >> ---------------------------------------------------------------------
: >> To unsubscribe, e-mail: [EMAIL PROTECTED]
: >> For additional commands, e-mail: [EMAIL PROTECTED]
: >>
: >>
: >>
: >>
: >>
: >> --
: >> Internal Virus Database is out-of-date.
: >> Checked by AVG Free Edition.
: >> Version: 7.1.375 / Virus Database: 267.15.0/248 - Release Date:
: >> 01/02/2006
: >>
: >>
: >
: >
: > ---------------------------------------------------------------------
: > To unsubscribe, e-mail: [EMAIL PROTECTED]
: > For additional commands, e-mail: [EMAIL PROTECTED]
: >
: >
: >
: >
: >
: > --
: > Internal Virus Database is out-of-date.
: > Checked by AVG Free Edition.
: > Version: 7.1.375 / Virus Database: 267.15.0/248 - Release Date: 01/02/2006
: >
: >
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: [EMAIL PROTECTED]
: For additional commands, e-mail: [EMAIL PROTECTED]
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Size + memory restrictions

Reply via email to