Thanks, Otis. Also appreciate your wonderful book, "Lucene in Action". The book is so well written that it makes me very curious about the low level design of the system, in addition to how to use it.
Back the cache problem, I agree that the native OS file system can do most of the job for me. However, there are two issues if we only rely on OS's cache. 1. Performance. Since all the cached disk data resides outside JVM heap space, the access efficiency from Java object to those cached data cannot be too high. 2. Volatile. Since the OS caches the disk data in a common area shared by multiple processes, but not only JVM. If there are other processes doing disk IO at the same time, chances are the cached Lucene index data from disk may be wiped. Therefore, a more reliable and efficient cache should reside inside JVM heap space. But due to the crowded JVM heap space, we have to manually "evict" the less frequently used data from the cache. Did I mis-understand anything? thanks again, Otis. Kan --- Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Kan, > > Some (all?) of what you described will typically be > handled for you by the file system. Yes, the JVM > would blow up with a OOM error if the index is too > big to fit in RAM. > > Otis > > ----- Original Message ---- > From: Kan Deng <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Cc: Kan Deng <[EMAIL PROTECTED]> > Sent: Wed 11 Jan 2006 08:37:25 PM EST > Subject: Cache index in RAMDirectory and evict > > Hi, there, > > In "Lucene in action", it mentions in Section 3.2.3 > "reading indexes into memory" that, > > "...RAMDirectory's constructor can be used to read a > file system-based index into memory, allowing the > application that accesses it to benefit from the > superior speed of the RAM: > > RAMDirectory ramDir = new RAMDirectory(dir)" > > Some questions here need help, > > 1. Suppose the content in the FSDirectory index is > read-only, but since it is so big that it exceeds > the > capacity of the JVM heap space. When constructing a > RAMDirectory to cache the entire FSDirectory, will > it > blow the JVM? > > 2. How to cache into the RAMDirectory with the most > frequently used index parts from the FSDirectory? > > The purpose is that to serve search query, first > of > all, search it in the RAMDirectory, if missed, goto > FSDirectory. > > My question is how to implement this > RAMDirectory-based cache. I assume it takes 3 steps. > Is it an appropriate workflow? > > a) Search in the RAMDirectory. > b) If missed, search in the FSDirectory > c) Add the documents from the FSDirectory to > RAMDirectory, > and remove some less frequently used document > from the RAMDirectory to save memory consumption. > > > 3. To make the cache mechanism more powerful, we can > count the frequency of the usage of every document > in > the RAMDirectory, and evict those less frequently > used > documents. > > How to implement the eviction? In details, is it > good enough by counting the usage of each documents > in > the index, and delete those documents not used very > often? Any better idea? > > > thanks, > Kan > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam > protection around > http://mail.yahoo.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: > [EMAIL PROTECTED] > For additional commands, e-mail: > [EMAIL PROTECTED] > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]