[
https://issues.apache.org/jira/browse/SOLR-8344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15033802#comment-15033802
]
Yonik Seeley commented on SOLR-8344:
------------------------------------
bq. But are you also arguing for always loading fields from docvalues even if
they are stored?
If a client requests fl=a,b,c (and these three fields all have docvalues *and*
are stored), it may be slower using docvalues *if* they are not cached yet.
The question then becomes.... why are they not cached?
- this is a one-off query, the docValues are not normally used
-- this is a case we should not be optimizing too much for
- this is going to be a very common query
-- in this case, we should use docvalues anyway.... the average latency will
drop as things get cached.
If we're requesting a large result set, it probably makes sense to use
docvalues.... every cache miss brings in 4K of that column, so subsequent
accesses will become less likely to miss (vs the same scenario in stored
fields). If the sort is by \_docid\_ then access will even be linear, meaning
there will be few cache misses. OS read-ahead being triggered will reduce that
even further.
If the index is so massive that the docvalues for these three fields can't be
cached for the random access case, then how will docvalues compare to stored
values?
With a disk-seek-per-doc-access, this is going to be a slow system regardless,
and very specialized (i.e. if one can't effectively cache these fields, then
things like sorting/faceting on these fields will be slow as well).
Based on what we know now, it feels like docValues is the right default.
Benchmarking to verify our assumptions would be a good thing.
> Replace reading stored fields to instead read from docValues
> ------------------------------------------------------------
>
> Key: SOLR-8344
> URL: https://issues.apache.org/jira/browse/SOLR-8344
> Project: Solr
> Issue Type: Bug
> Reporter: Ishan Chattopadhyaya
>
> This issue was discussed in the comments at SOLR-8220. Splitting it out to a
> separate issue so that we can have a focused discussion on whether/how to do
> this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]