[GitHub] lucene-solr pull request #47: SOLR-8858 SolrIndexSearcher#doc() Completely I...

maedhroz Fri, 01 Jul 2016 22:38:12 -0700

Github user maedhroz commented on a diff in the pull request:

    https://github.com/apache/lucene-solr/pull/47#discussion_r69373171
  
    --- Diff: 
solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java ---
    @@ -910,8 +910,12 @@ protected void createMainQuery(ResponseBuilder rb) {
             additionalAdded = addFL(additionalFL, "score", additionalAdded);
           }
         } else {
    -      // reset so that only unique key is requested in shard requests
    -      sreq.params.set(CommonParams.FL, 
rb.req.getSchema().getUniqueKeyField().getName());
    +      if (rb.req.getSearcher().enableLazyFieldLoading) {
    +        // reset so that only unique key is requested in shard requests
    +        sreq.params.set(CommonParams.FL, 
rb.req.getSchema().getUniqueKeyField().getName());
    +      } else {
    +        sreq.params.set(CommonParams.FL, "*");
    --- End diff --
    
    In the current master (without my patch), the query stage shard request for 
join in `DistribJoinFromCollectionTest` will pull the document from 
`SolrIndexSearcher#doc()' with only `id` in the specified `fields`. It does not 
use lazy field loading, and so uses a `DocumentStoredFieldVisitor` with no 
`fields` specified to load the whole document, and then put it in the 
`documentCache`. If we used lazy field loading, the cached document would still 
have some representation of all the fields, albeit lazy ones.
    
    With only the `SolrIndexSearcher` piece of my patch in place, the 
`TestSubQueryTransformer` failures are easy to avoidl, and I was able to fix 
them by simply reading the JavaDoc. (See the 
[comment](https://github.com/apache/lucene-solr/pull/47/files/4f9e67c63ce5130795df647ef5e86ae970601cb6#r69015716)
 below.) `DistribJoinFromCollectionTest` (and `TestCloudDeleteByQuery`) fails, 
because when, as I've laid out above, `doc()` actually respects the `fields` 
list during the main query phase, it caches a document that *only contains 
those fields*. When the actual field retrieval stage of the query hits the 
shard, `doc()` spits out a document that doesn't have the all fields in `fl`. 
(I'm not sure `DistribJoinFromCollectionTest` or `TestCloudDeleteByQuery` are 
doing something wrong, and they actually *pass* if they enable lazy field 
loading.)
    
    The reason I raised this issue in the first place is that I have a custom 
`StoredFieldsVisitor` that relies on `DocumentStoredFieldVisitor` providing the 
fields requested by the query. The unfortunate thing is that I think the 
`QueryComponent` bit of this PR isn't actually compatible with that, and I 
think that will need to be reverted no matter what. The only other ways I can 
imagine fixing this are:
    
    a.) Always cache an entire document, regardless of what we return from 
`doc()`. (Seems like it adds overhead.)
    b.) Skip caching under certain conditions, like if the `fields` list only 
contains the unique key (or key and score). (Seems very reliant on 
`QueryComponent` still.)
    c.) Always use lazy loading. (Seems invasive, but most of the examples I 
see use it anyway.)
    
    I don't love any of these options, but I'd be interested to get more 
informed opinions.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] lucene-solr pull request #47: SOLR-8858 SolrIndexSearcher#doc() Completely I...

Reply via email to