[jira] [Commented] (SOLR-11891) BinaryResponseWriter fetches unnecessary fields

Hoss Man (JIRA) Wed, 24 Jan 2018 14:27:32 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-11891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16338342#comment-16338342
 ]


Hoss Man commented on SOLR-11891:
---------------------------------

{quote}convertLuceneDocToSolrDoc should be innocent enough; it merely iterates 
the IndexableField instances in a collection and potentially calls name() on 
them which is always a simple getter. ...
{quote}
That iteration over all fields (real and lazy) is in and of itself the problem 
being reported – if the docs contain 1000 stored fields, but only 1 is 
requested in the 'fl' then the current code is looping over all 1000 fields in 
every doc even though it knows exactly which fields it needs – even if there 
are no disk reads involved for some lazy fields, it's still a waasteful 
iteration that's multiplicitive of the # of fields in the docs and the number 
of docs in the response, regardless of how small the fl is.
{quote}...We can't avoid putting all fields on the SolrDocument because of the 
potential for a document transformer to need it – and that's not known by 
RetrieveFieldsOptimizer.
{quote}
IIUC we *can* avoid it and RetrieveFieldsOptimizer *does* know that – as i 
mentioned in my response to mk: that's the entire point of 
{{DocTransformer.getExtraRequestFields()}} (see the javadocs) which is used to 
build up the list returned by {{SolrReturnFields.getLuceneFieldNames()}}
{quote}But it should be innocent enough because if no such transformer requests 
the value, then it shouldn't actually be loaded (it's lazy).
{quote}
Even if the {{Document}} field values are lazy, the existing code that loops 
over all of them is still building up the {{SolrDocument}} that contains all of 
those (lazy) fields – wasting time and a small amount of space (and that 
assumes they are all lazy: it's an option, it may not be on for some people – 
if/when they're not lazy then that takes up even more time & space reading them 
from disk)
----
I think the ideal "fix" is that the {{SolrReturnFields.getLuceneFieldNames()}} 
should get passed down all the way into {{convertLuceneDocToSolrDoc}} (or 
something we refactor it into) such that we do an runtime check of which list 
is smaller: {{SolrReturnFields.getLuceneFieldNames()}} or 
{{Document.getFields()}} – and then loop over that (smallest) list.

Regardless of what changes we make: we should have a whitebox test of 
{{convertLuceneDocToSolrDoc}} (or something we refactor it into) confirming 
that:
 * the resulting {{SolrDocument}} doesn't contain *any* fields that aren't 
needed
 * some explicitly un-requested "lazy" IndexableFields in the input Document 
must still be "lazy" (ie: not "actuallized") when the method returns (ie: that 
we didn't do a disk read we didn't need)

> BinaryResponseWriter fetches unnecessary fields
> -----------------------------------------------
>
>                 Key: SOLR-11891
>                 URL: https://issues.apache.org/jira/browse/SOLR-11891
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Response Writers
>    Affects Versions: 5.4, 6.4.2, 6.6.2
>            Reporter: wei wang
>            Priority: Major
>
> We observe that solr query time increases significantly with the number of 
> rows requested,  even all we retrieve for each document is just fl=id,score.  
> Debugged a bit and see that most of the increased time was spent in 
> BinaryResponseWriter,  converting lucene document into SolrDocument.  Inside 
> convertLuceneDocToSolrDoc():   
> [https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491839e6a6b69/solr/core/src/java/org/apache/solr/response/DocsStreamer.java#L182]
>  
> I am a bit puzzled why we need to iterate through all the fields in the 
> document. Why can’t we just iterate through the requested field list?    
> [https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491839e6a6b69/solr/core/src/java/org/apache/solr/response/DocsStreamer.java#L156]
>  
> e.g. when pass in the field list as 
> sdoc = convertLuceneDocToSolrDoc(doc, rctx.getSearcher().getSchema(), fnames)
> and just iterate through fnames,  there is a significant performance boost in 
> our case.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-11891) BinaryResponseWriter fetches unnecessary fields

Reply via email to