While orthogonal, I'd rather we not write code that exists purely only
to solve this problem.  Segment sizes are capped configurably in the
MergePolicy; you can lower it if the FixedBitSet would be too large
for the largest segments.  For single-segment indexes (and the status
quo today), maybe the solution is GC tuning as this is really a GC
ergonomics matter.

On Tue, Aug 20, 2024 at 11:23 AM Michael Gibney
<mich...@michaelgibney.net> wrote:
>
> Interesting -- although certainly related, I think these are somewhat
> orthogonal questions. You could well have a merge strategy/heap and gc
> configuration/index size that would have the same "humongous object"
> problem even under a per-segment cache approach (certainly for cores
> optimized to a single segment, but also in other cases).
>
> On Mon, Aug 19, 2024 at 7:12 PM David Smiley <dsmi...@apache.org> wrote:
> >
> > On Mon, Aug 19, 2024 at 2:32 PM Michael Gibney
> > <mich...@michaelgibney.net> wrote:
> > > For a more robust solution than fussing with G1HeapRegionSize, I'm
> > > wondering if it might be appropriate to change the implementation of
> > > BitDocSet so that larger instances will be backed by an array of
> > > multiple smaller FixedBitSet instances. This would introduce some
> > > extra complexity for DocSets over large indexes, but it shouldn't be
> > > terrible; it could be cleanly implemented and would ensure that we
> > > never allocate humongous objects in the service of BitDocSets ...
> >
> > IMO I don't think this is quite the right direction to go in.
> > Instead, we should prioritize a FixedBitSet per segment -- basically a
> > segment level filterCache.  This exists at the Lucene level
> > (IndexSearcher QueryCache); we could stop disabling it.  It'd go with
> > the new multiThreaded query stuff wonderfully.  But would then want to
> > do changes in BitDocSet so that it becomes an aggregate of
> > FixedBitSets sourced from that cache.  At least this is a strawman
> > proposal; needs more thought!
> >
> > Note that Yonik resisted ending Solr's top level FixedBitSet on the
> > grounds that benchmarks are needed to show that a segment approach is
> > better.  But that's challenging -- we will find the status quo is
> > faster for some situations and a segment level is faster for others.
> > Shrug.
> >
> > ~ David
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> > For additional commands, e-mail: dev-h...@solr.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
> For additional commands, e-mail: dev-h...@solr.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org

Reply via email to