[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072425#comment-15072425
 ] 

David Smiley commented on SOLR-8220:
------------------------------------

bq. (Yonik) Going forward, why wouldn't one just use docValues on the original 
field?

The main reason is that for returning the top-X docs with more than a few 
fields, row storage will probably be faster than columnar storage, 
performance-wise.  So at index time you can pay for both if you need columnar 
for other reasons (sorting/faceting).  Another reason is to retain a particular 
ordering for multi-valued fields.  Another reason is that Solr's highlighters 
don't read from docValues (solvable).

bq. (Ishan) David, do you think having the copyField targets to have 
useDocValuesAsStored as false in our example schemas partly alleviates the 
problem?

Yes, we should do that.  Ideally most users wouldn't want to monkey with such 
parameters (IMO).  But most schemas I've seen have at least one occurrence of 
an original input string indexed two ways for search & sorting/faceting.  And 
if our example schemas do, thus motivating us to set it to false for these 
fields, it just re-confirms my point.

Shalin: I very much care about ease of documenting/explaining this; I thought 
my comments showed I care.  I guess we just see this issue differently.  I'm 
coming around to a new interpretation of what useDocValuesAsStored is, as it 
was committed and today clarified by you and Ishan.  It basically means will a 
'fl' glob match the DV field or not.  If my understanding is true, then I think 
this is evidence I'm on to something with my "matchesFlGlob" suggestion.  You 
are free to disagree but I think it's extremely easy to document/describe/teach 
etc. what matchesFlGlob means, particularly if it's scope is expanded to apply 
to stored fields too.

FWIW I'll be on IRC.  My attempts to ping you haven't received a response.

> Read field from docValues for non stored fields
> -----------------------------------------------
>
>                 Key: SOLR-8220
>                 URL: https://issues.apache.org/jira/browse/SOLR-8220
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Keith Laban
>            Assignee: Shalin Shekhar Mangar
>         Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>    -- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to