gsmiller commented on a change in pull request #606:
URL: https://github.com/apache/lucene/pull/606#discussion_r786337003
##########
File path:
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FastTaxonomyFacetCounts.java
##########
@@ -126,23 +126,39 @@ private void countAll(IndexReader reader) throws
IOException {
NumericDocValues singleValued = DocValues.unwrapSingleton(multiValued);
if (singleValued != null) {
- for (int doc = singleValued.nextDoc();
- doc != DocIdSetIterator.NO_MORE_DOCS;
- doc = singleValued.nextDoc()) {
- if (liveDocs != null && liveDocs.get(doc) == false) {
- continue;
+ if (liveDocs != null) {
+ for (int doc = singleValued.nextDoc();
+ doc != DocIdSetIterator.NO_MORE_DOCS;
+ doc = singleValued.nextDoc()) {
+ if (liveDocs.get(doc)) {
+ values[(int) singleValued.longValue()]++;
+ }
+ }
+ } else {
Review comment:
>also, is there really a reason anymore to have count vs countAll? They
look the same to me. The only difference is livedocs check which is shown to do
nothing? So if we remove livedocs specialization, and remove count-vs-countAll
specialization, it should start to be a bit more manageable?
The only option I can think of for this is to put the liveDoc checking
behind a DISI abstraction. Then the implementation could be consolidated to
just operate on a DISI (which would either be backed by collected hits or by a
doc value field with liveDocs validation). The nuance here is that the
"standard" count functionality doesn't need to check for deleted docs as its
assumes everything in the FacetsCollector is "live," whereas countAll needs to
check for deleted docs. So this check needs to happen somewhere, unless
liveDocs is null (indicating there are no deleted docs in the index).
I went ahead and tried this out, but am still seeing pretty horrific qps
regressions.
```
TaskQPS baseline StdDevQPS candidate
StdDev Pct diff p-value
BrowseMonthTaxoFacets 29.20 (20.0%) 13.73
(15.6%) -53.0% ( -73% - -21%) 0.000
BrowseRandomLabelTaxoFacets 18.33 (14.4%) 10.98
(10.4%) -40.1% ( -56% - -17%) 0.000
BrowseDateTaxoFacets 21.36 (16.4%) 12.98
(10.6%) -39.2% ( -56% - -14%) 0.000
BrowseDayOfYearTaxoFacets 21.33 (16.4%) 13.04
(10.5%) -38.8% ( -56% - -14%) 0.000
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]