gsmiller commented on a change in pull request #606:
URL: https://github.com/apache/lucene/pull/606#discussion_r786337003



##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java
##########
@@ -126,23 +126,39 @@ private void countAll(IndexReader reader) throws 
IOException {
 
       NumericDocValues singleValued = DocValues.unwrapSingleton(multiValued);
       if (singleValued != null) {
-        for (int doc = singleValued.nextDoc();
-            doc != DocIdSetIterator.NO_MORE_DOCS;
-            doc = singleValued.nextDoc()) {
-          if (liveDocs != null && liveDocs.get(doc) == false) {
-            continue;
+        if (liveDocs != null) {
+          for (int doc = singleValued.nextDoc();
+              doc != DocIdSetIterator.NO_MORE_DOCS;
+              doc = singleValued.nextDoc()) {
+            if (liveDocs.get(doc)) {
+              values[(int) singleValued.longValue()]++;
+            }
+          }
+        } else {

Review comment:
       >also, is there really a reason anymore to have count vs countAll? They 
look the same to me. The only difference is livedocs check which is shown to do 
nothing? So if we remove livedocs specialization, and remove count-vs-countAll 
specialization, it should start to be a bit more manageable?
   
   The only option I can think of for this is to put the liveDoc checking 
behind a DISI abstraction. Then the implementation could be consolidated to 
just operate on a DISI (which would either be backed by collected hits or by a 
doc value field with liveDocs validation). The nuance here is that the 
"standard" count functionality doesn't need to check for deleted docs as its 
assumes everything in the FacetsCollector is "live," whereas countAll needs to 
check for deleted docs. So this check needs to happen somewhere, unless 
liveDocs is null (indicating there are no deleted docs in the index).
   
   I went ahead and tried this out, but am still seeing pretty horrific qps 
regressions.
   
   ```
                               TaskQPS baseline      StdDevQPS candidate      
StdDev                Pct diff p-value
              BrowseMonthTaxoFacets       29.20     (20.0%)       13.73     
(15.6%)  -53.0% ( -73% -  -21%) 0.000
        BrowseRandomLabelTaxoFacets       18.33     (14.4%)       10.98     
(10.4%)  -40.1% ( -56% -  -17%) 0.000
               BrowseDateTaxoFacets       21.36     (16.4%)       12.98     
(10.6%)  -39.2% ( -56% -  -14%) 0.000
          BrowseDayOfYearTaxoFacets       21.33     (16.4%)       13.04     
(10.5%)  -38.8% ( -56% -  -14%) 0.000
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to