[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

Keith Laban (JIRA) Wed, 18 Nov 2015 12:46:38 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15011925#comment-15011925
 ]


Keith Laban commented on SOLR-8220:
-----------------------------------

bq. If there is a need to distinguish between docValues as an alternative to a 
stored field

I think this would be the case only for multi valued fields at least until we 
had an alternative version of docValue multi valued preserving the original 
field (i.e. not sorted, not set) using something like BinaryDocValues 
underneath as you mentioned earlier. 

bq. I'll look at what it will take to modify the LazyDocument to make this work 
differently. Are you already looking into it, or have some thoughts around it?

Doing this properly requires us to be able to know all the possibly docValue 
fields on a document upfront and a way for LazyDocument to be able to load the 
lazy field from doc values. 

A large goal of this should be to have the ability to skip reading stored 
fields altogether if the field requirement is fully satisfied by docValues. 
However I'm not sure if using docValues would be more efficient than stored 
fields when all the fields are being returned. 

> Read field from docValues for non stored fields
> -----------------------------------------------
>
>                 Key: SOLR-8220
>                 URL: https://issues.apache.org/jira/browse/SOLR-8220
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Keith Laban
>         Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>    -- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

Reply via email to