Have you put up a prototype that solves your problem? Enrico
Il Gio 21 Ago 2025, 04:49 xiangying meng <xiangy...@apache.org> ha scritto: > Hi Enrico, > > Thank you for your feedback. Regarding the issue you mentioned, I have > two thoughts: > > First: As I mentioned in Proposal 1, whether to stop writing to all > disks when one disk is full is controlled by > isReadOnlyModeOnAnyDiskFullEnabled. > When the user sets isReadOnlyModeOnAnyDiskFullEnabled = false, it > indicates that they want the system to continue accepting write > requests even after one disk is full. In this case, writing to a > ledger disk without available space will definitely fail, so disabling > GC for that disk should not be a problem. Meanwhile, other normally > functioning ledger disks should continue to work as expected by the > user, including running GC as usual. Stopping GC on all disks in this > scenario might not align with the user's intention. > > Second: When one ledger disk is full while other disks still have > space, new data should not continue to be written to the full ledger > disk. This is a point that needs optimization, rather than an already > defined feature. > > BR, > Xiangying > > On Wed, Aug 20, 2025 at 8:48 PM Enrico Olivelli <eolive...@gmail.com> > wrote: > > > > Hello Xiangying, > > thanks for sharing your problem and your proposals. > > > > One issue I can see is that there is no way for the Bookie to go in > > "partial readonly mode". > > If you stop GC only on one disk and the Bookie accepts writes for that > disk > > the problem is going to be worse and worse > > > > > > Best > > Enrico > > > > > > Il giorno mer 20 ago 2025 alle ore 14:21 xiangying meng < > > xiangy...@apache.org> ha scritto: > > > > > Hi BookKeeper Community, > > > > > > I’d like to propose a modification to how garbage collection (GC) > > > handles disk-full scenarios. > > > Currently, when any ledger disk reaches full capacity, > > > suspendMajorGC()/suspendMinorGC() pauses GC for all disks. > > > This behavior can unnecessarily impact healthy disks, especially in > > > cases of uneven disk utilization. > > > > > > Consider two scenarios: > > > 1. Even Data Distribution: > > > All disks are nearly full, and one fills up first. Temporarily > > > disabling GC only on the full disk (before propagating suspension to > > > others) is safe. > > > 2. Uneven Data Distribution: > > > Due to write skew or cleanup inconsistencies, a single disk may > > > fill up while others still have free space. Halting GC globally > > > penalizes operational disks. > > > > > > To address this, I propose three solutions: > > > > > > Option 1: Reuse isReadOnlyModeOnAnyDiskFullEnabled. When > > > isReadOnlyModeOnAnyDiskFullEnabled == true, stop GC on all disks; > > > otherwise, other disks should continue normal operations without GC > > > suspension. > > > Reason: isReadOnlyModeOnAnyDiskFullEnabled reflects the user’s intent > > > about whether to stop all bookie writes when any single disk is full, > > > but GC might need to create new files for writing data ahead of > > > cleanup. > > > > > > Option 2: When a single disk becomes full, only stop GC for that > > > specific disk. Other disks should continue their GC processes > > > uninterrupted. > > > Reason: This issue should be treated as a bug fix rather than a > > > breaking change. No configuration is needed; simply fix the current > > > behavior. > > > > > > Option 3: Add a new configuration to control whether to stop GC on > > > other disks when any single disk becomes full. > > > Reason: This does not change the existing behavior but allows users to > > > configure it according to their needs. > > > > > > I think Option 2 is the most appropriate, as it directly addresses the > > > problem without introducing additional configuration complexity. > > > > > > Looking forward to your feedback. > > > > > > BR, > > > Xiangying > > > >