[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15028962#comment-15028962
 ] 

Shalin Shekhar Mangar commented on SOLR-8220:
---------------------------------------------

Guys, sorry for not paying attention to this earlier but on a quick reading 
through the comments and an offline conversation with Ishan, I want to point 
out a few things.

bq. Theoretical optimization, will skip reading from stored fields if all the 
requested fields are available in docValues

bq. I don't think some of the Lucene folks want docValues modeled as stored 
fields at the Lucene level.

>From a performance perspective, reading values from DocValues always (if they 
>exist) can be horrible because each field access in docvalues may need a 
>random disk seek, whereas, all stored fields for a document are kept together 
>and need only 1 random seek and a sequential block read. That, and the fact 
>that docvalues aren't in the document cache makes me think that we should not 
>model docvalues as a stored field and treat them equivalently. At least not 
>without supporting benchmarks.

So my suggestion is that we not mix the two issues i.e. keep this issue focused 
on adding syntactic sugar to read field from doc values for non-stored fields 
in whatever ways proposed. By the way, this is already possible using the 
'field' DocTransformer e.g. fl=field(my_dv_field)

> Read field from docValues for non stored fields
> -----------------------------------------------
>
>                 Key: SOLR-8220
>                 URL: https://issues.apache.org/jira/browse/SOLR-8220
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Keith Laban
>         Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>    -- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to