[
https://issues.apache.org/jira/browse/SOLR-8858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360002#comment-15360002
]
ASF GitHub Bot commented on SOLR-8858:
--------------------------------------
Github user maedhroz commented on a diff in the pull request:
https://github.com/apache/lucene-solr/pull/47#discussion_r69373171
--- Diff:
solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java ---
@@ -910,8 +910,12 @@ protected void createMainQuery(ResponseBuilder rb) {
additionalAdded = addFL(additionalFL, "score", additionalAdded);
}
} else {
- // reset so that only unique key is requested in shard requests
- sreq.params.set(CommonParams.FL,
rb.req.getSchema().getUniqueKeyField().getName());
+ if (rb.req.getSearcher().enableLazyFieldLoading) {
+ // reset so that only unique key is requested in shard requests
+ sreq.params.set(CommonParams.FL,
rb.req.getSchema().getUniqueKeyField().getName());
+ } else {
+ sreq.params.set(CommonParams.FL, "*");
--- End diff --
In the current master (without my patch), the query stage shard request for
join in `DistribJoinFromCollectionTest` will pull the document from
`SolrIndexSearcher#doc()' with only `id` in the specified `fields`. It does not
use lazy field loading, and so uses a `DocumentStoredFieldVisitor` with no
`fields` specified to load the whole document, and then put it in the
`documentCache`. If we used lazy field loading, the cached document would still
have some representation of all the fields, albeit lazy ones.
With only the `SolrIndexSearcher` piece of my patch in place, the
`TestSubQueryTransformer` failures are easy to avoidl, and I was able to fix
them by simply reading the JavaDoc. (See the
[comment](https://github.com/apache/lucene-solr/pull/47/files/4f9e67c63ce5130795df647ef5e86ae970601cb6#r69015716)
below.) `DistribJoinFromCollectionTest` (and `TestCloudDeleteByQuery`) fails,
because when, as I've laid out above, `doc()` actually respects the `fields`
list during the main query phase, it caches a document that *only contains
those fields*. When the actual field retrieval stage of the query hits the
shard, `doc()` spits out a document that doesn't have the all fields in `fl`.
(I'm not sure `DistribJoinFromCollectionTest` or `TestCloudDeleteByQuery` are
doing something wrong, and they actually *pass* if they enable lazy field
loading.)
The reason I raised this issue in the first place is that I have a custom
`StoredFieldsVisitor` that relies on `DocumentStoredFieldVisitor` providing the
fields requested by the query. The unfortunate thing is that I think the
`QueryComponent` bit of this PR isn't actually compatible with that, and I
think that will need to be reverted no matter what. The only other ways I can
imagine fixing this are:
a.) Always cache an entire document, regardless of what we return from
`doc()`. (Seems like it adds overhead.)
b.) Skip caching under certain conditions, like if the `fields` list only
contains the unique key (or key and score). (Seems very reliant on
`QueryComponent` still.)
c.) Always use lazy loading. (Seems invasive, but most of the examples I
see use it anyway.)
I don't love any of these options, but I'd be interested to get more
informed opinions.
> SolrIndexSearcher#doc() Completely Ignores Field Filters Unless Lazy Field
> Loading is Enabled
> ---------------------------------------------------------------------------------------------
>
> Key: SOLR-8858
> URL: https://issues.apache.org/jira/browse/SOLR-8858
> Project: Solr
> Issue Type: Bug
> Affects Versions: 4.6, 4.10, 5.5
> Reporter: Caleb Rackliffe
> Labels: easyfix
>
> If {{enableLazyFieldLoading=false}}, a perfectly valid fields filter will be
> ignored, and we'll create a {{DocumentStoredFieldVisitor}} without it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]