[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

Yonik Seeley (JIRA) Mon, 28 Dec 2015 07:03:05 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072789#comment-15072789
 ]


Yonik Seeley commented on SOLR-8220:
------------------------------------

{quote}
> (Yonik) Going forward, why wouldn't one just use docValues on the original 
> field [ rather than a copyField ] ?
The main reason is that for returning the top-X docs with more than a few 
fields, row storage will probably be faster than columnar storage, 
performance-wise.
{quote}

If one still wants to use row-stored for performance tweaks (the exact 
cross-over point will hopefully be determined by SOLR-8344), then they can 
still do that, and it makes sense to do both row-stored and column-stored on 
the original field.  Sorry if it wasn't clear, but that was my original point.

bq. It [ useDocValuesAsStored ] basically means will a 'fl' glob match the DV 
field or not.
That's only one aspect.  Every place that "uses" stored fields should treat DV 
field as a stored field.  It also affects streaming expressions / SQL, 
highlighting, partial updates, etc.

Although I can see why you might think that at first - the fact that the 
current implementation does return fl=exact_field_name even when 
useDocValuesAsStored=false was a surprise to me as well.  I don't see it as a 
big deal though (more like a minor syntactic shortcut for a pseudo field).

To me, useDocValuesAsStored=false means "this isn't really a stored field... 
it's just an implementation artifact for some query-time feature we need".

> Read field from docValues for non stored fields
> -----------------------------------------------
>
>                 Key: SOLR-8220
>                 URL: https://issues.apache.org/jira/browse/SOLR-8220
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Keith Laban
>            Assignee: Shalin Shekhar Mangar
>         Attachments: SOLR-8220-5x.patch, SOLR-8220-branch_5x.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>    -- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-8220) Read field from docValues for non stored fields

Reply via email to