[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15018489#comment-15018489
 ] 

Ishan Chattopadhyaya edited comment on SOLR-8220 at 11/20/15 6:43 PM:
----------------------------------------------------------------------

{quote}
 We are adding fields retrieved from docValues by doing the following:
{code}
doc.add(schemaField.getType().createField(schemaField, 
sdv.get(docid).utf8ToString(), 1.0f));
{code}

this {{createField}} call is returning {{null}} based on the code I wrote 
above. Perhaps we need to create fields differently, or change how 
{{createField}} works.
{quote}
[Referencing this comment from SOLR-8316]. Can we "decorate" the SolrDocument 
in DocStreamer instead of trying to do that with the StoredDocument from 
lucene? That will give us the benefits: (a) we won't need to fix SOLR-8316 
(although I still don't understand how it affects the work here, since I 
thought the createField was doing its job and consequently the tests were 
passing. Maybe I'm missing something), (b) we can leave the StoredDocument as 
is, and not change it from under the document cache (which is probably an 
awkward thing with the current patch), (c) it has efficient containsKey(), if 
needed, so the linear O(n) cost can be avoided. Though, point b will mean we 
won't need containsKey() anyway.
This also means that SOLR-8276 will have to change, and there we have decorate 
a SolrInputDocument instead of a SolrDocument. 
Keith, Yonik, what do you think?


was (Author: ichattopadhyaya):
{quote}
 We are adding fields retrieved from docValues by doing the following:
{code}
doc.add(schemaField.getType().createField(schemaField, 
sdv.get(docid).utf8ToString(), 1.0f));
{code}

this {{createField}} call is returning {{null}} based on the code I wrote 
above. Perhaps we need to create fields differently, or change how 
{{createField}} works.
{quote}
[Referencing this comment from SOLR-8316]. Can we "decorate" the SolrDocument 
in DocStreamer instead of trying to do that with the StoredDocument from 
lucene? That will give us the benefits: (a) we won't need to fix SOLR-8316, (b) 
we can leave the StoredDocument as is, and not change it from under the 
document cache (which is probably an awkward thing with the current patch), (c) 
it has efficient containsKey(), if needed, so the linear O(n) cost can be 
avoided. Though, point b will mean we won't need containsKey() anyway.
This also means that SOLR-8276 will have to change, and there we have decorate 
a SolrInputDocument instead of a SolrDocument. 
Keith, Yonik, what do you think?

> Read field from docValues for non stored fields
> -----------------------------------------------
>
>                 Key: SOLR-8220
>                 URL: https://issues.apache.org/jira/browse/SOLR-8220
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Keith Laban
>         Attachments: SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>    -- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to