magibney commented on code in PR #1184:
URL: https://github.com/apache/solr/pull/1184#discussion_r1028792927
##########
solr/core/src/java/org/apache/solr/search/BitDocSet.java:
##########
@@ -194,19 +194,31 @@ public void addAllTo(FixedBitSet target) {
@Override
public DocSet andNot(DocSet other) {
- FixedBitSet newbits = bits.clone();
+ FixedBitSet newbits = getFixedBitSetClone();
+ andNot(newbits, other);
+ return new BitDocSet(newbits);
+ }
+
+ /**
+ * Helper method for andNot that takes FixedBitSet and DocSet. This modifies
the provided
+ * FixedBitSet to remove all bits contained in the DocSet argument --
equivalent to calling
+ * a.andNot(b), but modifies the state of the FixedBitSet instead of
returning a new FixedBitSet.
+ *
+ * @param bits FixedBitSet to operate on
+ * @param other The DocSet to compare to
+ */
+ protected static void andNot(FixedBitSet bits, DocSet other) {
if (other instanceof BitDocSet) {
- newbits.andNot(((BitDocSet) other).bits);
+ bits.andNot(((BitDocSet) other).bits);
} else {
DocIterator iter = other.iterator();
while (iter.hasNext()) {
int doc = iter.nextDoc();
- if (doc < newbits.length()) {
- newbits.clear(doc);
+ if (doc < bits.length()) {
+ bits.clear(doc);
Review Comment:
it occurs to me: now that this method no longer returns the FixedBitSet, it
could easily return `boolean`, something along the lines of `true: may have
been modified`, `false: definitely not modified`, detecting the latter for the
non-BitDocSet case by calling `bits.getAndClear(int)`. Not sure it'd be worth
it, but thought it might be worth mentioning ... (could use `false` return
value to avoid resetting size, which I _think_ actually might make.a difference
for some use cases: e.g., a cached positive filter and small (non-BitDocSet)
negative filters that don't actually exclude anything).
##########
solr/core/src/java/org/apache/solr/search/MutableBitDocSet.java:
##########
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.search;
+
+import org.apache.lucene.util.FixedBitSet;
+
+/**
+ * A {@link BitDocSet} based implementation that mutates the underlying bits
for andNot and
+ * intersection. This allows for computing the combinations of sets without
duplicating the
+ * underlying array. This MutableBitDocSet should not be cached because it can
be modified.
+ *
+ * @since solr 9.2
+ */
+class MutableBitDocSet extends BitDocSet {
+ private MutableBitDocSet(FixedBitSet bits) {
+ super(bits);
+ }
+
+ /**
+ * Returns a mutable BitDocSet that is a copy of the provided BitDocSet.
+ *
+ * @param bitDocSet a BitDocSet
+ * @return copy of bitDocSet that is now mutable
+ */
+ public static MutableBitDocSet fromBitDocSet(BitDocSet bitDocSet) {
+ return new MutableBitDocSet(bitDocSet.getFixedBitSetClone());
Review Comment:
It will often (indeed, usually?) be the case that we have `size` already
calculated on the input `bitDocSet`. It could be worth passing this along using
the `BitDocSet(FixedBitSet bits, int size)` ctor? This would only have a
practical benefit in the case where you have one positive filter and a bunch of
small negative filters that don't actually modify the set (in conjunction with
the suggestion to make `andNot(FixedBitSet bits, DocSet other)` return
boolean). But it costs nothing to preserve this information if it already
exists.
##########
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##########
@@ -1190,38 +1192,52 @@ public ProcessedFilter getProcessedFilter(DocSet
setFilter, List<Query> queries)
}
Query posQuery = QueryUtils.getAbs(q);
- sets[end] = getPositiveDocSet(posQuery);
+ DocSet docSet = getPositiveDocSet(posQuery);
// Negative query if absolute value different from original
if (Objects.equals(q, posQuery)) {
- neg[end] = false;
- // keep track of the smallest positive set.
- // This optimization is only worth it if size() is cached, which it
would
- // be if we don't do any set operations.
- int sz = sets[end].size();
- if (sz < smallestCount) {
- smallestCount = sz;
- smallestIndex = end;
- answer = sets[end];
+ // keep track of the smallest positive set; use "answer" for this.
+ if (answer == null) {
+ answer = docSet;
+ continue;
}
+ // note: assume that size() is cached. It generally comes from the
cache, so should be.
+ if (docSet.size() < answer.size()) {
+ // swap answer & docSet so that answer is smallest
+ DocSet tmp = answer;
+ answer = docSet;
+ docSet = tmp;
+ }
+ neg[end] = false;
} else {
neg[end] = true;
}
+ sets[end++] = docSet;
+ } // end of queries
- end++;
- }
+ if (end > 0) {
+ // Are all of our normal cached filters negative?
+ if (answer == null) {
+ answer = getLiveDocSet();
+ }
- // Are all of our normal cached filters negative?
- if (end > 0 && answer == null) {
- answer = getLiveDocSet();
- }
+ // This optimizes for the case where we have more than 2 filters and
instead
+ // of copying the bitsets we make one mutable bitset. We should only do
this
+ // for BitDocSet since it clones the backing bitset for andNot and
intersection.
+ if (end > 1 && answer instanceof BitDocSet) {
+ answer = MutableBitDocSet.fromBitDocSet((BitDocSet) answer);
+ }
- // do negative queries first to shrink set size
- for (int i = 0; i < end; i++) {
- if (neg[i]) answer = answer.andNot(sets[i]);
- }
+ // do negative queries first to shrink set size
+ for (int i = 0; i < end; i++) {
+ if (neg[i]) answer = answer.andNot(sets[i]);
+ }
+
+ for (int i = 0; i < end; i++) {
+ if (!neg[i]) answer = answer.intersection(sets[i]);
+ }
- for (int i = 0; i < end; i++) {
- if (!neg[i] && i != smallestIndex) answer = answer.intersection(sets[i]);
+ // Make sure to keep answer as an immutable DocSet if we made it mutable
+ answer = MutableBitDocSet.unwrapIfMutable(answer);
}
// ignore "answer" if it simply matches all docs
Review Comment:
Out of scope of this PR, but while looking at this code: will
`answer.size()` always be called downstream of this? If so, then it's fine to
force size calculation here; if _not_, I think there should be better ways to
check this condition than naively calling `answer.size()` after all
intersections are done.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]