Erick Erickson <erickerick...@gmail.com> wrote:
> I think part of it is locality. By that I mean two docValues fields in
> the same document have no relation to each other in terms of their
> location on disk. So _assuming_ all your DocValues can't be contained
> in memory, you may be doing a bunch of disk seeks.

Fair enough: Doc Values overhead scales linear with the number of fields, 
whereas stored is more constant-ish. As you note with export, Doc Values can be 
faster than stored with a few fields but using them for hundreds would probably 
be quite a lot slower.

> And maybe part of it is the notion of stuffing large text fields into
> a DocValues field just to return it seems like abusing DV.

That seems like a reasonable explanation to me. If that is what the talk of 
misuse is about, I can understand it. It is not a case I have any current 
interest in optimizing and I agree that "real" compression (as opposed to the 
light prefix-reuse from Doc Values) is the best choice.

> That said, the Streaming code uses DV fields exclusively and I got
> 200K rows/second returned without tuning a single thing which I doubt
> you're going to get with stored fields!

> So I think as usual, "it depends".

I would like to think so, as that implies that it does make sense to consider 
if changes to Doc Values codec representation causes a performance regression, 
when using them to populate documents.

- Toke Eskildsen

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to