Thanks for the clarification Uwe. If the whole idea is a new RAMDirectory implementation, that is more efficient, then it's ok. I think that the ideas you write are interesting.
Have you tried MMapDir for read access in comparison to RAMDirectory for a > larger index > I have, and I support the decision not to use RAMDirectory for such cases. BUT, MMapDir is not recommended for use on all platforms / JDKs. Second, it cannot be used on e.g. HDFS. So sometimes RAMDirectory is the best you can do. Again, if the whole idea is improving RAMDirectory's implementation, then that I totally agree with and it makes sense. My point was that we should not lose the ability to load indexes into RAM. Shai On Tue, Dec 20, 2011 at 3:36 PM, Uwe Schindler <[email protected]> wrote: > Hi,**** > > ** ** > > You misunderstood the whole thing. The idea was to maybe replace > RAMDirectory by a “clone” of MMapDirectory that uses large > DirectByteBuffers outside the JVM heap. The current RAMDirectory is very > limited (buffersize hardcoded to 8 KB, if you have a 50 Gigabyte Index in > this RAMDirectory, your GC simply drives crazy – we investigated this > several times for customers. RAMDirectory was in fact several times slower > than a simple disk-based MMapDir). Also the locking on the RAMFile class is > horrible, as for large indexes you have to change buffer several times when > seeking/reading/…, which does heavily locking. In contrast, MMapDir is > completely lock-free!**** > > ** ** > > Until there is no replacement we will not remove it, but the current > RAMDirectory is not useable for large indexes. That’s a limitation and the > design of this class does not support anything else. It’s currently > unfixable and instead of putting work into fixing it, the time should be > spent in working on a new ByteBuffer-based RAMDir with larger blocs/blocs > that merge or IOContext helping to calculate the file size before writing > it (e.g. when triggering a merge you know the approximate size of the file > before, so you can allocate a buffer that’s better than 8 Kilobytes). Also > directByteBuffer helps to make GC happy, as the RAMdir is outside JVM heap. > **** > > ** ** > > **Ø **Also, RAMDirectory is still more efficient than MMapDirectory, if > you want to index (and then search) on a small (sometimes even transient) > amount of data**** > > ** ** > > That’s not true, as RAMdir uses more time for switching buffers than > reading the data. The proble m is that MMapDir does not support **writing** > and that why we plan to improve this. Have you tried MMapDir for read > access in comparison to RAMDirectory for a larger index, it outperforms > several times (depending on OS and if file data is in FS cache already). > The new directory will simply mimic the MMapIndexInput, add > MMapIndexOutput, but not based on a mmaped buffer, instead a in-memory > (Direct)ByteBuffer (outside or inside JVM heap – both will be supported). > This simplifies code a lot.**** > > ** ** > > The discussions about the limitations of crappy RAMDirectory were > discussed on conferences, sorry. We did **not**decide to remove it > (without a patch/replacement). The whole “message” on the issue was that > RAMDirectory is a bad idea. The recommended approach at the moment to > handle large in-ram directories would be to use a tmpfs on Linux/Solaris > and use MMapDir on top (for larger indexes). The MMap would then directly > map the RAM of the underlying tmpfs.**** > > ** ** > > Uwe**** > > ** ** > > -----**** > > Uwe Schindler**** > > H.-H.-Meier-Allee 63, D-28213 Bremen**** > > http://www.thetaphi.de**** > > eMail: [email protected]**** > > ** ** > > *From:* Shai Erera [mailto:[email protected]] > *Sent:* Tuesday, December 20, 2011 2:13 PM > *To:* [email protected] > *Subject:* Plans to remove RAMDirectory?**** > > ** ** > > Hi > > Uwe mentioned on LUCENE-3653 that there are plans to remove RAMDirectory > from Trunk and move to tests only: "RAMDirectory is written for tests, not > for production use. There are already plans to remove it from Lucene trunk > and move to tests only." (see full > comment<https://issues.apache.org/jira/browse/LUCENE-3653?focusedCommentId=13172338&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13172338> > ) > > I wasn't aware of such plans - were there emails about it or it has been > discussed on IRC? > > I disagree that RAMDirectory is useful only for tests. For example, when > someone wants to index on Hadoop, RAMDirectory can be very useful (even > though it's not the only solution). Also, RAMDirectory is still more > efficient than MMapDirectory, if you want to index (and then search) on a > small (sometimes even transient) amount of data. We use it in several cases > for such purposes. > > If RAMDirectory needs to improve (for instance, allocate bigger byte[] > chunks), then IMO we should do that, rather than drop it entirely from > core. I think it's a very valuable Directory implementation that Lucene > offers, and I'd hate to see it disappear. > > Shai**** >
