[
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072712#comment-15072712
]
Shalin Shekhar Mangar commented on SOLR-8220:
---------------------------------------------
I had a chat with David on IRC. In summary, his objections are:
# The primary is the default — are stored=false & docValues=true returned from
globs by default (whatever we name the field/fieldtype attributes). It would be
set to false often, assuming many schemas index text both ways?
# Globs only match a DV field if useDocValuesAsStored=true and so the property
name should be indicative of that
# Perhaps if we have a matchesFlGlob parameter we would then have no need for
useDocValuesAsStored?
I was of the opinion that matchesFlGlob=true is not a great name because what
if we use JSON request API in future and there's no "fl" so to speak. David
then also suggested an alternate name e.g. matchesReturnGlob: whether a glob
(asterisk) pattern in an ‘fl’ parameter (or equivalent) will match this field.
Applies to stored or docValues fields. Defaults to what stored is set to.
I'm still not sure about this, David. What does stored=true,
matchesReturnGlob=false mean? Should that even be allowed?
Replying to some of your earlier points:
bq. The main reason is that for returning the top-X docs with more than a few
fields, row storage will probably be faster than columnar storage,
performance-wise.
Agreed, and that's why we need to figure out the specifics in SOLR-8344 so that
we choose the right way of retrieval.
bq. Another reason is to retain a particular ordering for multi-valued fields
We can make always try to use stored values for multi-valued fields in
SOLR-8344.
bq. Another reason is that Solr's highlighters don't read from docValues
(solvable).
Why is this a reason for not using DocValues on the original field?
bq. Ideally most users wouldn't want to monkey with such parameters (IMO). But
most schemas I've seen have at least one occurrence of an original input string
indexed two ways for search & sorting/faceting. And if our example schemas do,
thus motivating us to set it to false for these fields, it just re-confirms my
point.
The thing is the copy fields you refer to are deliberately added and I think
it should be okay to expect that users will choose the value for
useDocValuesAsStored according to their use-cases for such fields. It is also
common for people to have docValues=true and stored=true together just because
they believe that it isn't possible to retrieve doc values.
What do you think?
> Read field from docValues for non stored fields
> -----------------------------------------------
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
> Issue Type: Improvement
> Reporter: Keith Laban
> Assignee: Shalin Shekhar Mangar
> Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch,
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch,
> SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch,
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch,
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch,
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch,
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch,
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch,
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch,
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which
> requires redundant data to be stored on disk. Since reading from docValues is
> both efficient and a common practice (facets, analytics, streaming, etc),
> reading values from docValues when a stored version of the field does not
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as
> they would always be returned sorted in the docValues approach. I believe
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think
> it should live closer to where stored fields are loaded in the
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues,
> facets, analytics, streaming, etc, all seem to be doing their own ways,
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
> -- return field from docValue if the field is not stored and in docValues,
> if the field is stored return it from stored fields
> - fl="*"
> -- return only stored fields
> - fl="+"
> -- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first
> pass. 2b - is current behavior
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]