[jira] [Commented] (SOLR-11891) BinaryResponseWriter fetches unnecessary fields

Hoss Man (JIRA) Wed, 14 Mar 2018 19:22:54 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-11891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16399802#comment-16399802
 ]


Hoss Man commented on SOLR-11891:
---------------------------------



bq. FYI I attached the diff we made to DocsStreamer.  

Thanks wei -- unfortunately your patch breaks compilation of some other classes 
(on master) but also suffers from an NPE in the case where globs are used (ex: 
{{fl=\*}}

I started down the road of a more "optimized" patch with what i suggested 
above...

bq. I think the ideal "fix" is that the SolrReturnFields.getLuceneFieldNames() 
should get passed down all the way into convertLuceneDocToSolrDoc (or something 
we refactor it into) such that we do an runtime check of which list is smaller: 
SolrReturnFields.getLuceneFieldNames() or Document.getFields() – and then loop 
over that (smallest) list.

...and i've currently got a patch which implements this along with a whitebox 
test to assert that the "optimization" is being used -- but while working on it 
i realized this isn't actually an optimization...

{code}
for (String fname : returnFieldNames) {
  for (IndexableField f : doc.getFields(fname)) {
    // do stuff
  }
}
{code}
The problem is that {{Document}} isn't a Map -- it doesn't have efficient 
lookup of the values associated with a fieldname.  In order to do the 
{{fieldname=>value[]}} lookup of {{doc.getFields(fname)}}, it has to do an 
iterative scan all of the internal {{IndexableField}} (it can't even short 
circut out when it finds one because there could be multiples with the same 
name, and there's no garuntee they are in a predictible order)

So with this "optimization" we're actually introducing *more* loops over all 
the {{IndexableField}} instances.

The key reason wei was probably aple to see an improvement with hte change 
mentioned, is because at least when {{convertDocumentToSolrDocument}} is done, 
the final {{SolrDocumnet}} is as small as possible, so the *subsequent* scans 
in the ResponseWriter are faster.

We should be able to accomplish the same speed up "safely" by ensuring that 
when we do loop over the {{IndexableField}} instances, we check 
{{ReturnField.wantsField(fname)}}

I'll work on a revised (and much simpler) patch tomorow.




> BinaryResponseWriter fetches unnecessary fields
> -----------------------------------------------
>
>                 Key: SOLR-11891
>                 URL: https://issues.apache.org/jira/browse/SOLR-11891
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Response Writers
>    Affects Versions: 5.4, 6.4.2, 6.6.2
>            Reporter: wei wang
>            Priority: Major
>         Attachments: DocsStreamer.java.diff, SOLR-11891.patch.BAD
>
>
> We observe that solr query time increases significantly with the number of 
> rows requested,  even all we retrieve for each document is just fl=id,score.  
> Debugged a bit and see that most of the increased time was spent in 
> BinaryResponseWriter,  converting lucene document into SolrDocument.  Inside 
> convertLuceneDocToSolrDoc():   
> [https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491839e6a6b69/solr/core/src/java/org/apache/solr/response/DocsStreamer.java#L182]
>  
> I am a bit puzzled why we need to iterate through all the fields in the 
> document. Why can’t we just iterate through the requested field list?    
> [https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491839e6a6b69/solr/core/src/java/org/apache/solr/response/DocsStreamer.java#L156]
>  
> e.g. when pass in the field list as 
> sdoc = convertLuceneDocToSolrDoc(doc, rctx.getSearcher().getSchema(), fnames)
> and just iterate through fnames,  there is a significant performance boost in 
> our case.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-11891) BinaryResponseWriter fetches unnecessary fields

Reply via email to