Consider a Lucene index consisting of 10m documents with a total disk footprint of 3G. Consider an application that treats this index as read-only, and runs very complex queries over it. Queries with many terms, some of them 'fuzzy' and 'should' terms and a dismax. And, finally, consider doing all this on a box with over 100G of physical memory, some cores, and nothing else to do with its time.
I should probably just stop here and see what thoughts come back, but I'll go out on a limb and type the word 'codec'. The MMapDirectory, of course, cheerfully gets to keep every single bit in memory. And then each query runs, exercising the the codec, building up a flurry of Java objects, all of which turn into garbage and we start all over. So, I find myself wondering, is there some sort of an opportunity for a codec-that-caches in here? In other words, I'd like to sell some of my space to buy some time.