I feel that we should offer both... somehow. If we make the default Codec conservative, and all the "good stuff" experimental, people might prefer to use a conservative Lucene (just because the other path means being adventurous or very expert), and perhaps then we'll get conservative results compared to others...
We should at least try not to break back compat when it's not necessary. Eg if the format changes such that we can still keep the old format for old indexes (read-only) with a different name, then we should consider it. I know it means we'll need to maintain two Codecs wrt API changes, but I hope that's not common. I would like to see an out of the box Lucene which comes with the best defaults (and I feel we already do!) but let users customize them to some level without feeling that they lose the support of the community. And done Codecs will just remain experimental for a long time until they are stabilized. That's fine too. Shai On Jul 19, 2013 4:32 PM, "Robert Muir" <[email protected]> wrote: > > > On Fri, Jul 19, 2013 at 9:27 AM, Shai Erera <[email protected]> wrote: > >> Fair enough, let's resolve it in a way that makes everyone happy. Not >> being able to use DV on-disk "officially" seems like a drawback to me. >> >> I still prefer that we support both in-memory and on-disk and even >> default to in-memory, because Lucene should have great performance out of >> the box, and these days RAM is not so much an issue. Perhaps we should >> explore writing DirectDVFormat which keeps everything in memory outside the >> heap (this is for a separate issue). >> >> I get what you're saying about supporting formats, but I think that if >> on-disk vs in-memory (for any format) means "same representation on disk, >> different behavior at runtime", we should be able to support both variants? >> That way, in-memory is just an optimized implementation detail, and is not >> strictly a "Format". >> > > But its not: they use different datastructures and so on. Different codecs. > > I dont think we should default to loading everything in heap. This is > really trappy and its not interesting to me that fieldcache used to do > this, its not relevant at all to what we are doing now. We should leave > this to the OS like we do with postings lists and so on. I think diskdv is > "correct" in that it loads up the minimal stuff it should need into heap to > keep things fast (the monotonic blockpacked readers for addressing, which > are typically very small). >
