On Wed, Jun 10, 2009 at 7:23 PM, Jason Rutherglen<jason.rutherg...@gmail.com> wrote: > Cool! Sounds like with LUCENE-1458 we can experiment with some > of these things. Does CSF become just another codec?
I believe LUCENE-1458 currently only makes terms dict & postings pluggable... >> I'm leary of having terms dict live entirely on disk, though > we should certainly explore it. > > Yeah, it should theoretically help with reloading, it could use > a skiplist (as we have a disk version of that implemented) > instead of binarysearch). It seems like with things like > TrieRange (which potentially adds many fields and terms) it > could be useful to let the IO cache calculate what we need in > RAM and what we don't, otherwise we're constantly at risk of > exceeding heap usage. There'll be other potential RAM issues > (such as page faults), but it seems like users will constantly > be up against the inability to precalculate Java heap usage of > data structures (whereas file based data usage can be measured). > Norms are another example, and with flexible indexing (and > scoring?) there may be additional fields the user may want to > change dynamically, that if completely loaded into heap cause > OOM problems. > > I guess I personally think it would be great to not worry about > exceeding heap with Lucene apps (as it's a guessing game), and > then one can simply analyze the OS level IO cache/swap space to > see if the app could slow down due to the machine not having > enough RAM. I think this would remove one of the major > differences between a Java based search engine and a C++ based > one. Marvin and I discussed this quite a bit already in LUCENE-1458... we should make it pluggable and then try both -- let the machine tell us ;) Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org