[
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072425#comment-15072425
]
David Smiley commented on SOLR-8220:
------------------------------------
bq. (Yonik) Going forward, why wouldn't one just use docValues on the original
field?
The main reason is that for returning the top-X docs with more than a few
fields, row storage will probably be faster than columnar storage,
performance-wise. So at index time you can pay for both if you need columnar
for other reasons (sorting/faceting). Another reason is to retain a particular
ordering for multi-valued fields. Another reason is that Solr's highlighters
don't read from docValues (solvable).
bq. (Ishan) David, do you think having the copyField targets to have
useDocValuesAsStored as false in our example schemas partly alleviates the
problem?
Yes, we should do that. Ideally most users wouldn't want to monkey with such
parameters (IMO). But most schemas I've seen have at least one occurrence of
an original input string indexed two ways for search & sorting/faceting. And
if our example schemas do, thus motivating us to set it to false for these
fields, it just re-confirms my point.
Shalin: I very much care about ease of documenting/explaining this; I thought
my comments showed I care. I guess we just see this issue differently. I'm
coming around to a new interpretation of what useDocValuesAsStored is, as it
was committed and today clarified by you and Ishan. It basically means will a
'fl' glob match the DV field or not. If my understanding is true, then I think
this is evidence I'm on to something with my "matchesFlGlob" suggestion. You
are free to disagree but I think it's extremely easy to document/describe/teach
etc. what matchesFlGlob means, particularly if it's scope is expanded to apply
to stored fields too.
FWIW I'll be on IRC. My attempts to ping you haven't received a response.
> Read field from docValues for non stored fields
> -----------------------------------------------
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
> Issue Type: Improvement
> Reporter: Keith Laban
> Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch,
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch,
> SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch,
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch,
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch,
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch,
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch,
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch,
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch,
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which
> requires redundant data to be stored on disk. Since reading from docValues is
> both efficient and a common practice (facets, analytics, streaming, etc),
> reading values from docValues when a stored version of the field does not
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as
> they would always be returned sorted in the docValues approach. I believe
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think
> it should live closer to where stored fields are loaded in the
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues,
> facets, analytics, streaming, etc, all seem to be doing their own ways,
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
> -- return field from docValue if the field is not stored and in docValues,
> if the field is stored return it from stored fields
> - fl="*"
> -- return only stored fields
> - fl="+"
> -- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first
> pass. 2b - is current behavior
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]