[ 
https://issues.apache.org/jira/browse/SOLR-11891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16401262#comment-16401262
 ] 

Hoss Man commented on SOLR-11891:
---------------------------------

bq. We should be able to accomplish the same speed up "safely" by ensuring that 
when we do loop over the IndexableField instances, we check 
ReturnField.wantsField(fname)

It turns out this is actually more nuanced...

# {{wantsField(String)}} only returns true when the *client* wants the field -- 
not any time the field is needed for other purposes (like transformers).  The 
correct logic is to do a hash lookup on the Set from 
{{ReturnFields.getLuceneFieldNames()}}
# Once i made these changes and started doing more testing, i uncovered 2 
independent bugs in some DocTransformers that are currently dependent on the 
existing sloppy code in {{convertLuceneDocToSolrDoc}} *AND* the 
{{documentCache}} being enabled in order to function properly...
#* {{ChildDocTransformerFactory}} assumes the uniqueKey field is always going 
to be available in converted {{SolrDocument}} w/o explicitly asking for them in 
a {{getExtraRequestFields()}} impl
#* {{RawValueTransformerFactory}} assumes it doesn't need to do anything if the 
{{wt}} doesn't match it's configured type, and that those fields will 
implicitly be in the {{SolrDocument}} and get returned to the user 
automatically (as regular String values)

I'm attaching a patch that fixes all of these, and creating new linked issue to 
track the sub-bugs.

I'll do some more hammering on tests and aim to commit soon unless people have 
concerns.


> BinaryResponseWriter fetches unnecessary fields
> -----------------------------------------------
>
>                 Key: SOLR-11891
>                 URL: https://issues.apache.org/jira/browse/SOLR-11891
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Response Writers
>    Affects Versions: 5.4, 6.4.2, 6.6.2
>            Reporter: wei wang
>            Priority: Major
>         Attachments: DocsStreamer.java.diff, SOLR-11891.patch.BAD
>
>
> We observe that solr query time increases significantly with the number of 
> rows requested,  even all we retrieve for each document is just fl=id,score.  
> Debugged a bit and see that most of the increased time was spent in 
> BinaryResponseWriter,  converting lucene document into SolrDocument.  Inside 
> convertLuceneDocToSolrDoc():   
> [https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491839e6a6b69/solr/core/src/java/org/apache/solr/response/DocsStreamer.java#L182]
>  
> I am a bit puzzled why we need to iterate through all the fields in the 
> document. Why can’t we just iterate through the requested field list?    
> [https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491839e6a6b69/solr/core/src/java/org/apache/solr/response/DocsStreamer.java#L156]
>  
> e.g. when pass in the field list as 
> sdoc = convertLuceneDocToSolrDoc(doc, rctx.getSearcher().getSchema(), fnames)
> and just iterate through fnames,  there is a significant performance boost in 
> our case.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to