Have you put up a prototype that solves your problem?

Enrico

Il Gio 21 Ago 2025, 04:49 xiangying meng <xiangy...@apache.org> ha scritto:

> Hi Enrico,
>
> Thank you for your feedback. Regarding the issue you mentioned, I have
> two thoughts:
>
> First: As I mentioned in Proposal 1, whether to stop writing to all
> disks when one disk is full is controlled by
> isReadOnlyModeOnAnyDiskFullEnabled.
> When the user sets isReadOnlyModeOnAnyDiskFullEnabled = false, it
> indicates that they want the system to continue accepting write
> requests even after one disk is full. In this case, writing to a
> ledger disk without available space will definitely fail, so disabling
> GC for that disk should not be a problem. Meanwhile, other normally
> functioning ledger disks should continue to work as expected by the
> user, including running GC as usual. Stopping GC on all disks in this
> scenario might not align with the user's intention.
>
> Second: When one ledger disk is full while other disks still have
> space, new data should not continue to be written to the full ledger
> disk. This is a point that needs optimization, rather than an already
> defined feature.
>
> BR,
> Xiangying
>
> On Wed, Aug 20, 2025 at 8:48 PM Enrico Olivelli <eolive...@gmail.com>
> wrote:
> >
> > Hello Xiangying,
> > thanks for sharing your problem and your proposals.
> >
> > One issue I can see is that there is no way for the Bookie to go in
> > "partial readonly mode".
> > If you stop GC only on one disk and the Bookie accepts writes for that
> disk
> > the problem is going to be worse and worse
> >
> >
> > Best
> > Enrico
> >
> >
> > Il giorno mer 20 ago 2025 alle ore 14:21 xiangying meng <
> > xiangy...@apache.org> ha scritto:
> >
> > > Hi BookKeeper Community,
> > >
> > > I’d like to propose a modification to how garbage collection (GC)
> > > handles disk-full scenarios.
> > > Currently, when any ledger disk reaches full capacity,
> > > suspendMajorGC()/suspendMinorGC() pauses GC for all disks.
> > > This behavior can unnecessarily impact healthy disks, especially in
> > > cases of uneven disk utilization.
> > >
> > > Consider two scenarios:
> > > 1. Even Data Distribution:
> > >    All disks are nearly full, and one fills up first. Temporarily
> > > disabling GC only on the full disk (before propagating suspension to
> > > others) is safe.
> > > 2. Uneven Data Distribution:
> > >    Due to write skew or cleanup inconsistencies, a single disk may
> > > fill up while others still have free space. Halting GC globally
> > > penalizes operational disks.
> > >
> > > To address this, I propose three solutions:
> > >
> > > Option 1: Reuse isReadOnlyModeOnAnyDiskFullEnabled. When
> > > isReadOnlyModeOnAnyDiskFullEnabled == true, stop GC on all disks;
> > > otherwise, other disks should continue normal operations without GC
> > > suspension.
> > > Reason: isReadOnlyModeOnAnyDiskFullEnabled reflects the user’s intent
> > > about whether to stop all bookie writes when any single disk is full,
> > > but GC might need to create new files for writing data ahead of
> > > cleanup.
> > >
> > > Option 2: When a single disk becomes full, only stop GC for that
> > > specific disk. Other disks should continue their GC processes
> > > uninterrupted.
> > > Reason: This issue should be treated as a bug fix rather than a
> > > breaking change. No configuration is needed; simply fix the current
> > > behavior.
> > >
> > > Option 3: Add a new configuration to control whether to stop GC on
> > > other disks when any single disk becomes full.
> > > Reason: This does not change the existing behavior but allows users to
> > > configure it according to their needs.
> > >
> > > I think Option 2 is the most appropriate, as it directly addresses the
> > > problem without introducing additional configuration complexity.
> > >
> > > Looking forward to your feedback.
> > >
> > > BR,
> > > Xiangying
> > >
>

Reply via email to