[
https://issues.apache.org/jira/browse/SOLR-11891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16338009#comment-16338009
]
Hoss Man commented on SOLR-11891:
---------------------------------
NOTE: This Jira was opened in response to a mailing list thread...
[https://lists.apache.org/thread.html/%3CCAL1=wh0uoo1j01d-froath_ea9cvn9cm44n1cgjerdqrtku...@mail.gmail.com%3E]
bq. I believe that after {{Document doc = docFetcher.doc(id, fnames)}} Lucene's
Document contains only the requested fields.
My comments from that thread requesting this jira...
{noformat}
I have a hunch here -- but i haven't verified it.
First of all: the specific code in question that you mention assumes it
doesn't *need* to filter out the result of "doc.getFields()" basd on the
'fl' because at the point in the processing where the DocsStreamer is
looping over the result of "doc.getFields()" the "Document" object it's
dealing with *should* only contain the specific (subset of stored) fields
requested by the fl param -- this is handled by RetrieveFieldsOptimizer &
SolrDocumentFetcher that the DocsStreamer builds up acording to the
results of ResultContext.getReturnFields() when asking the
SolrIndexSearcher to fetch the doc()
But i think what's happening here is that because of the documentCache,
there are cases where the SolrIndexSearcher is not actaully using
a SolrDocumentStoredFieldVisitor to limit what's requested from the
IndexReader, and the resulting Document contains all fields -- which is
then compounded by code that loops over every field.
At a quick glance, I'm a little fuzzy on how exactly
enableLazyFieldLoading may/may-not be affecting things here, but either
way I think you are correct -- we can/should make this overall stack of
code smarter about looping over fields we know we want, vs looping over
all fields in the doc.
Can you please file a jira for this?
{noformat}
bq. ... I'm not sure but this can be related somehow, when result [transformer]
is used it may refer to any field, so it might impact fl filtering, ...
I don't think so -- the DocTransformer API has {{getExtraRequestFields()}}
explicitly for this purpose, so that the ReturnFields structure (used by
DocStreamer & RetrieveFieldsOptimizer) should already know exactly which fields
are needed -- even by the transformers.
> BinaryResponseWriter fetches unnecessary fields
> -----------------------------------------------
>
> Key: SOLR-11891
> URL: https://issues.apache.org/jira/browse/SOLR-11891
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: Response Writers
> Affects Versions: 5.4, 6.4.2, 6.6.2
> Reporter: wei wang
> Priority: Major
>
> We observe that solr query time increases significantly with the number of
> rows requested, even all we retrieve for each document is just fl=id,score.
> Debugged a bit and see that most of the increased time was spent in
> BinaryResponseWriter, converting lucene document into SolrDocument. Inside
> convertLuceneDocToSolrDoc():
> [https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491839e6a6b69/solr/core/src/java/org/apache/solr/response/DocsStreamer.java#L182]
>
> I am a bit puzzled why we need to iterate through all the fields in the
> document. Why can’t we just iterate through the requested field list?
> [https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491839e6a6b69/solr/core/src/java/org/apache/solr/response/DocsStreamer.java#L156]
>
> e.g. when pass in the field list as
> sdoc = convertLuceneDocToSolrDoc(doc, rctx.getSearcher().getSchema(), fnames)
> and just iterate through fnames, there is a significant performance boost in
> our case.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]