[
https://issues.apache.org/jira/browse/SOLR-11891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16338342#comment-16338342
]
Hoss Man commented on SOLR-11891:
---------------------------------
{quote}convertLuceneDocToSolrDoc should be innocent enough; it merely iterates
the IndexableField instances in a collection and potentially calls name() on
them which is always a simple getter. ...
{quote}
That iteration over all fields (real and lazy) is in and of itself the problem
being reported – if the docs contain 1000 stored fields, but only 1 is
requested in the 'fl' then the current code is looping over all 1000 fields in
every doc even though it knows exactly which fields it needs – even if there
are no disk reads involved for some lazy fields, it's still a waasteful
iteration that's multiplicitive of the # of fields in the docs and the number
of docs in the response, regardless of how small the fl is.
{quote}...We can't avoid putting all fields on the SolrDocument because of the
potential for a document transformer to need it – and that's not known by
RetrieveFieldsOptimizer.
{quote}
IIUC we *can* avoid it and RetrieveFieldsOptimizer *does* know that – as i
mentioned in my response to mk: that's the entire point of
{{DocTransformer.getExtraRequestFields()}} (see the javadocs) which is used to
build up the list returned by {{SolrReturnFields.getLuceneFieldNames()}}
{quote}But it should be innocent enough because if no such transformer requests
the value, then it shouldn't actually be loaded (it's lazy).
{quote}
Even if the {{Document}} field values are lazy, the existing code that loops
over all of them is still building up the {{SolrDocument}} that contains all of
those (lazy) fields – wasting time and a small amount of space (and that
assumes they are all lazy: it's an option, it may not be on for some people –
if/when they're not lazy then that takes up even more time & space reading them
from disk)
----
I think the ideal "fix" is that the {{SolrReturnFields.getLuceneFieldNames()}}
should get passed down all the way into {{convertLuceneDocToSolrDoc}} (or
something we refactor it into) such that we do an runtime check of which list
is smaller: {{SolrReturnFields.getLuceneFieldNames()}} or
{{Document.getFields()}} – and then loop over that (smallest) list.
Regardless of what changes we make: we should have a whitebox test of
{{convertLuceneDocToSolrDoc}} (or something we refactor it into) confirming
that:
* the resulting {{SolrDocument}} doesn't contain *any* fields that aren't
needed
* some explicitly un-requested "lazy" IndexableFields in the input Document
must still be "lazy" (ie: not "actuallized") when the method returns (ie: that
we didn't do a disk read we didn't need)
> BinaryResponseWriter fetches unnecessary fields
> -----------------------------------------------
>
> Key: SOLR-11891
> URL: https://issues.apache.org/jira/browse/SOLR-11891
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: Response Writers
> Affects Versions: 5.4, 6.4.2, 6.6.2
> Reporter: wei wang
> Priority: Major
>
> We observe that solr query time increases significantly with the number of
> rows requested, even all we retrieve for each document is just fl=id,score.
> Debugged a bit and see that most of the increased time was spent in
> BinaryResponseWriter, converting lucene document into SolrDocument. Inside
> convertLuceneDocToSolrDoc():
> [https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491839e6a6b69/solr/core/src/java/org/apache/solr/response/DocsStreamer.java#L182]
>
> I am a bit puzzled why we need to iterate through all the fields in the
> document. Why can’t we just iterate through the requested field list?
> [https://github.com/apache/lucene-solr/blob/df874432b9a17b547acb24a01d3491839e6a6b69/solr/core/src/java/org/apache/solr/response/DocsStreamer.java#L156]
>
> e.g. when pass in the field list as
> sdoc = convertLuceneDocToSolrDoc(doc, rctx.getSearcher().getSchema(), fnames)
> and just iterate through fnames, there is a significant performance boost in
> our case.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]