[GitHub] [solr] dsmiley commented on a change in pull request #592: SOLR-14765: Optimize DocList creation by skipping sort for sort-irrelevant cases

GitBox Sat, 05 Feb 2022 13:42:29 -0800


dsmiley commented on a change in pull request #592:
URL: https://github.com/apache/solr/pull/592#discussion_r800100152




##########
File path: solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java
##########
@@ -854,8 +868,58 @@ private DocSet getAndCacheDocSet(Query query) throws 
IOException {
     return filterCache.computeIfAbsent(query, q -> getDocSetNC(q, null));
   }
 
-  private static Query matchAllDocsQuery = new MatchAllDocsQuery();
+  private static final Query MATCH_ALL_DOCS_QUERY = new MatchAllDocsQuery();
   private volatile BitDocSet liveDocs;
+  private final FutureTask<BitDocSet> liveDocsFuture = new 
FutureTask<>(this::computeLiveDocs);
+
+  private BitDocSet computeLiveDocs() {

Review comment:
       So you wrote all this low level code to avoid a simple MatchAllDocsQuery 
with a DocSetCollector? See 
org.apache.solr.search.SolrIndexSearcher.getDocSetNC which calls DocSetUtil.  
Even if we choose to keep your low level code, I'd propose it go to a special 
case inside of DocSetUtil when the query is MatchAllDocsQuery.

##########
File path: solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java
##########
@@ -1299,6 +1385,47 @@ public DocList getDocList(Query query, List<Query> 
filterList, Sort lsort, int o
   public static final int GET_DOCLIST = 0x02; // get the documents actually 
returned in a response
   public static final int GET_SCORES = 0x01;
 
+  private static boolean isConstantScoreQuery(Query q) {

Review comment:
       I've seen a need for this elsewhere.  Let's make it handle other cases 
too (e.g. MatchAllDocsQuery, MatchNoDocsQuery, "Filter" (soon to be 
DocSetQuery)) and be recursive to the cases below (hey why not).  This could 
move to QueryUtils rather than having the massive SolrIndexSearcher grow

##########
File path: solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java
##########
@@ -865,9 +929,31 @@ public BitDocSet getLiveDocSet() throws IOException {
     // Going through the filter cache will provide thread safety here if we 
only had getLiveDocs,
     // but the addition of setLiveDocs means we needed to add volatile to 
"liveDocs".
     BitDocSet docs = liveDocs;
-    if (docs == null) {
-      //note: maybe should instead calc manually by segment, using 
FixedBitSet.copyOf(segLiveDocs); avoid filter cache?
-      liveDocs = docs = getDocSetBits(matchAllDocsQuery);
+    if (docs != null) {
+      matchAllDocsCacheHitCount.incrementAndGet();
+    } else {
+      if (matchAllDocsCacheComputationTracker.compareAndSet(Long.MIN_VALUE, 
0)) {
+        // run the initial/only/final future inline
+        // This thread will block execution here and `liveDocsFuture.get()` 
(below) should then return immediately
+        liveDocsFuture.run();
+      } else {
+        // another thread has already called `computeLiveDocs.run()`; this 
thread will block on
+        // `liveDocsFuture.get()` (below)
+        if (matchAllDocsCacheComputationTracker.getAndIncrement() < 0) {

Review comment:
       The Future + AtomicLong thing looks overly complicated; maybe I'm not 
getting why it needs to be so complicated.  Can we have a simple double-check 
lock with synchronized to ensure only one creation happens?  We've all seen 
this pattern many times, I'm sure.  We don't even need the liveDocs to be 
volatile since it's an immutable object, not unlike a String, except for the 
lazy populated size but that doesn't matter as it's idempotent.

##########
File path: solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java
##########
@@ -2281,6 +2455,11 @@ public void initializeMetrics(SolrMetricsContext 
parentContext, String scope) {
     parentContext.gauge(() -> openTime, true, "openedAt", 
Category.SEARCHER.toString(), scope);
     parentContext.gauge(() -> warmupTime, true, "warmupTime", 
Category.SEARCHER.toString(), scope);
     parentContext.gauge(() -> registerTime, true, "registeredAt", 
Category.SEARCHER.toString(), scope);
+    parentContext.gauge(fullSortCount::get, true, "fullSortCount", 
Category.SEARCHER.toString(), scope);
+    parentContext.gauge(skipSortCount::get, true, "skipSortCount", 
Category.SEARCHER.toString(), scope);
+    parentContext.gauge(matchAllDocsCacheConsultationCount::get, true, 
"matchAllDocsCacheConsultationCount", Category.SEARCHER.toString(), scope);
+    parentContext.gauge(matchAllDocsCacheHitCount::get, true, 
"matchAllDocsCacheHitCount", Category.SEARCHER.toString(), scope);
+    parentContext.gauge(matchAllDocsCacheComputationTracker::get, true, 
"matchAllDocsCacheComputationTracker", Category.SEARCHER.toString(), scope);

Review comment:
       Is there value in publishing that "Tracker"?  It doesn't even sound like 
a metric.
   
   If we want these metrics, perhaps we should instead have a one-element 
Cache, which already publishes metrics and in a standard way across other 
Caches.  This is just an idea... if it's not easy then nevermind.
   

##########
File path: solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java
##########
@@ -854,8 +868,58 @@ private DocSet getAndCacheDocSet(Query query) throws 
IOException {
     return filterCache.computeIfAbsent(query, q -> getDocSetNC(q, null));
   }
 
-  private static Query matchAllDocsQuery = new MatchAllDocsQuery();
+  private static final Query MATCH_ALL_DOCS_QUERY = new MatchAllDocsQuery();
   private volatile BitDocSet liveDocs;
+  private final FutureTask<BitDocSet> liveDocsFuture = new 
FutureTask<>(this::computeLiveDocs);
+
+  private BitDocSet computeLiveDocs() {
+    switch (leafContexts.size()) {
+      case 0:
+        assert numDocs() == 0;
+        return new BitDocSet(DocSet.empty().getFixedBitSet(), 0);

Review comment:
       This looks round-about.  Just return `new FixedBitSet(0);`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [solr] dsmiley commented on a change in pull request #592: SOLR-14765: Optimize DocList creation by skipping sort for sort-irrelevant cases

Reply via email to