Erick Erickson <erickerick...@gmail.com> wrote: > I think part of it is locality. By that I mean two docValues fields in > the same document have no relation to each other in terms of their > location on disk. So _assuming_ all your DocValues can't be contained > in memory, you may be doing a bunch of disk seeks.
Fair enough: Doc Values overhead scales linear with the number of fields, whereas stored is more constant-ish. As you note with export, Doc Values can be faster than stored with a few fields but using them for hundreds would probably be quite a lot slower. > And maybe part of it is the notion of stuffing large text fields into > a DocValues field just to return it seems like abusing DV. That seems like a reasonable explanation to me. If that is what the talk of misuse is about, I can understand it. It is not a case I have any current interest in optimizing and I agree that "real" compression (as opposed to the light prefix-reuse from Doc Values) is the best choice. > That said, the Streaming code uses DV fields exclusively and I got > 200K rows/second returned without tuning a single thing which I doubt > you're going to get with stored fields! > So I think as usual, "it depends". I would like to think so, as that implies that it does make sense to consider if changes to Doc Values codec representation causes a performance regression, when using them to populate documents. - Toke Eskildsen --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org