> > > > I'm looking to see if I can submit a patch to fix this, but it seems
> > > > like the durability bit field for devices may be only 2 bits, is that
> > > > right?
> > > 
> > > That gets you values of 0-3. Why is that not enough?
> >
> > In bch2_mi_to_cpu, it looks like durability is encoded with a "bias" 
> > (default value) that maps {0,1,2,3} => {1,0,1,2}.
> >
> >             .durability = BCH_MEMBER_DURABILITY(mi)
> >                   ? BCH_MEMBER_DURABILITY(mi) - 1
> >                   : 1,
> >
> > This is pretty unfortunate, because it looks like if I want to use RAID6 
> > (replicas=3), I can't represent a device as having inherent durability of 
> > RAID6 (durability=3).
> >
> > It doesn't look like too much work to add a feature flag 
> > `BCH_FEATURE_durability_bias_v2` which when set, modifies the bias to 
> > unconditionally add one to the 2-bit field, mapping {0,1,2,3} to {1,2,3,4}. 
> > That would support even very large erasure encoded arrays as well, where 
> > you might use something like RS (56,4) for a common 60 drive JBOD. 
> > Practically speaking though I don't think anyone uses stripes that wide in 
> > a single array. At least not for spinning rust, but it's been a long time 
> > since I've worked with enterprise storage and I understand the rules have 
> > changed with flash now.
> >
> > I can submit patches for implementing the feature if you want me to submit 
> > them as a PR. Not sure about your stance on LLM-authored code though.
> 
> Actually there's an easier way, which I've done a few different times
before. We can extend BCH_MEMBER_DURABILITY to 4 bits (should be
sufficient, no?), with the high bits going whenever we've got room in
bch_member.
> 
> Rename BCH_MEMBER_DURABILITY -> BCH_MEMBER_DURABILTIY_LO
> 
> BCH_MEMBER_DURABILITY_HI for the new two bits
> 
> Then write new get/set functions for BCH_MEMBER_DURABILITY that
reads/stores from the lo and hi fields.
> 
> But we'd still want a new on disk format version for this, and then use
bch2_request_incompat_feature() whenever attempting to set a durability
htat doesn't fit in the old 2 bit field.

Do we want the new field to be additive after saturating 
BCH_MEMBER_DURABILITY_LO at 2, rather than treat it as a 4 bit field which 
could result in an older kernel seeing 0b01 and interpreting it as 0? So:

LO    HI    VALUE
# existing range:
00    00    1
01    00    0
10    00    1
11    00    2
# expanded range:
11    01    3
11    10    4
11    11    5

Then an older kernel will read any device with durability >2 as having 
durability=2, which is not ideal but I worry that durability=0 might result in 
undefined (or unspecified?) behavior.

Reply via email to