While orthogonal, I'd rather we not write code that exists purely only to solve this problem. Segment sizes are capped configurably in the MergePolicy; you can lower it if the FixedBitSet would be too large for the largest segments. For single-segment indexes (and the status quo today), maybe the solution is GC tuning as this is really a GC ergonomics matter.
On Tue, Aug 20, 2024 at 11:23 AM Michael Gibney <mich...@michaelgibney.net> wrote: > > Interesting -- although certainly related, I think these are somewhat > orthogonal questions. You could well have a merge strategy/heap and gc > configuration/index size that would have the same "humongous object" > problem even under a per-segment cache approach (certainly for cores > optimized to a single segment, but also in other cases). > > On Mon, Aug 19, 2024 at 7:12 PM David Smiley <dsmi...@apache.org> wrote: > > > > On Mon, Aug 19, 2024 at 2:32 PM Michael Gibney > > <mich...@michaelgibney.net> wrote: > > > For a more robust solution than fussing with G1HeapRegionSize, I'm > > > wondering if it might be appropriate to change the implementation of > > > BitDocSet so that larger instances will be backed by an array of > > > multiple smaller FixedBitSet instances. This would introduce some > > > extra complexity for DocSets over large indexes, but it shouldn't be > > > terrible; it could be cleanly implemented and would ensure that we > > > never allocate humongous objects in the service of BitDocSets ... > > > > IMO I don't think this is quite the right direction to go in. > > Instead, we should prioritize a FixedBitSet per segment -- basically a > > segment level filterCache. This exists at the Lucene level > > (IndexSearcher QueryCache); we could stop disabling it. It'd go with > > the new multiThreaded query stuff wonderfully. But would then want to > > do changes in BitDocSet so that it becomes an aggregate of > > FixedBitSets sourced from that cache. At least this is a strawman > > proposal; needs more thought! > > > > Note that Yonik resisted ending Solr's top level FixedBitSet on the > > grounds that benchmarks are needed to show that a segment approach is > > better. But that's challenging -- we will find the status quo is > > faster for some situations and a segment level is faster for others. > > Shrug. > > > > ~ David > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org > > For additional commands, e-mail: dev-h...@solr.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org > For additional commands, e-mail: dev-h...@solr.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org For additional commands, e-mail: dev-h...@solr.apache.org