[
https://issues.apache.org/jira/browse/SOLR-11891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16399802#comment-16399802
]
Hoss Man commented on SOLR-11891:
---------------------------------
bq. FYI I attached the diff we made to DocsStreamer.
Thanks wei -- unfortunately your patch breaks compilation of some other classes
(on master) but also suffers from an NPE in the case where globs are used (ex:
{{fl=\*}}
I started down the road of a more "optimized" patch with what i suggested
above...
bq. I think the ideal "fix" is that the SolrReturnFields.getLuceneFieldNames()
should get passed down all the way into convertLuceneDocToSolrDoc (or something
we refactor it into) such that we do an runtime check of which list is smaller:
SolrReturnFields.getLuceneFieldNames() or Document.getFields() – and then loop
over that (smallest) list.
...and i've currently got a patch which implements this along with a whitebox
test to assert that the "optimization" is being used -- but while working on it
i realized this isn't actually an optimization...
{code}
for (String fname : returnFieldNames) {
for (IndexableField f : doc.getFields(fname)) {
// do stuff
}
}
{code}
The problem is that {{Document}} isn't a Map -- it doesn't have efficient
lookup of the values associated with a fieldname. In order to do the
{{fieldname=>value[]}} lookup of {{doc.getFields(fname)}}, it has to do an
iterative scan all of the internal {{IndexableField}} (it can't even short
circut out when it finds one because there could be multiples with the same
name, and there's no garuntee they are in a predictible order)
So with this "optimization" we're actually introducing *more* loops over all
the {{IndexableField}} instances.
The key reason wei was probably aple to see an improvement with hte change
mentioned, is because at least when {{convertDocumentToSolrDocument}} is done,
the final {{SolrDocumnet}} is as small as possible, so the *subsequent* scans
in the ResponseWriter are faster.
We should be able to accomplish the same speed up "safely" by ensuring that
when we do loop over the {{IndexableField}} instances, we check
{{ReturnField.wantsField(fname)}}
I'll work on a revised (and much simpler) patch tomorow.
> BinaryResponseWriter fetches unnecessary fields
> -----------------------------------------------
>
> Key: SOLR-11891
> URL: https://issues.apache.org/jira/browse/SOLR-11891
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: Response Writers
> Affects Versions: 5.4, 6.4.2, 6.6.2
> Reporter: wei wang
> Priority: Major
> Attachments: DocsStreamer.java.diff, SOLR-11891.patch.BAD
>
>
> We observe that solr query time increases significantly with the number of
> rows requested, even all we retrieve for each document is just fl=id,score.
> Debugged a bit and see that most of the increased time was spent in
> BinaryResponseWriter, converting lucene document into SolrDocument. Inside
> convertLuceneDocToSolrDoc():
> [https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491839e6a6b69/solr/core/src/java/org/apache/solr/response/DocsStreamer.java#L182]
>
> I am a bit puzzled why we need to iterate through all the fields in the
> document. Why can’t we just iterate through the requested field list?
> [https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491839e6a6b69/solr/core/src/java/org/apache/solr/response/DocsStreamer.java#L156]
>
> e.g. when pass in the field list as
> sdoc = convertLuceneDocToSolrDoc(doc, rctx.getSearcher().getSchema(), fnames)
> and just iterate through fnames, there is a significant performance boost in
> our case.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]