On Thu, Feb 29, 2024 at 04:16:00PM -0500, Kent Overstreet wrote: > On Thu, Feb 29, 2024 at 01:44:27PM -0500, Brian Foster wrote: > > On Wed, Feb 28, 2024 at 11:10:12PM -0500, Kent Overstreet wrote: > > > I think it ended up not needing to be moved, and I just forgot to drop > > > it - originally I disallowed accounting entries that referenced > > > nonexistent devices, but that wasn't workable so now it's only nonzero > > > accounting keys that aren't allowed to reference nonexistent devices. > > > > > > I'll see if I can delete it. > > > > > > > Do you mean to delete the change that moves the call, or the flush call > > entirely? > > Delte the change, I think there's further cleanup (& probably bugs to > fix) possible with that flush call but I'm not going to get into it > right now. >
Ok, just trying to determine whether I need to look back and make sure this doesn't regress the problem this originally fixed. > > > +/* > > > + * Notes on disk accounting: > > > + * > > > + * We have two parallel sets of counters to be concerned with, and both > > > must be > > > + * kept in sync. > > > + * > > > + * - Persistent/on disk accounting, stored in the accounting btree and > > > updated > > > + * via btree write buffer updates that treat new accounting keys as > > > deltas to > > > + * apply to existing values. But reading from a write buffer btree is > > > + * expensive, so we also have > > > + * > > > > I find the wording a little odd here, and I also think it would be > > helpful to explain how/from where the deltas originate. For example, > > something along the lines of: > > > > "Persistent/on disk accounting, stored in the accounting btree and > > updated via btree write buffer updates. Accounting updates are > > represented as deltas that originate from <somewhere? trans triggers?>. > > Accounting keys represent these deltas through commit into the write > > buffer. The accounting/delta keys in the write buffer are then > > accumulated into the appropriate accounting btree key at write buffer > > flush time." > > yeah, that's worth including. > > There's an interesting point that you're touching on; btree write buffer > are always dependent state changes from some other (non write buffer) > btree; we never look at a write buffer btree and generate an update > there - we can't, reading from a write buffer btree doesn't get you > anything consistent or up to date. > > So in normal operation it really only makes sense to do write buffer > updates from a transactional trigger - that's the only way to use them > and have them be consistent with the resst of the filesystem. > > And since triggers work by comparing old and new, they naturally > generate updates that are deltas. > Hm that is interesting, I hadn't made that connection. Thanks. Brian > > > + * - In memory accounting, where accounting is stored as an array of > > > percpu > > > + * counters, indexed by an eytzinger array of disk acounting > > > keys/bpos (which > > > + * are the same thing, excepting byte swabbing on big endian). > > > + * > > > > Not really sure about the keys vs. bpos thing, kind of related to my > > comments on the earlier patch. It might be more clear to just elide the > > implementation details here, i.e.: > > > > "In memory accounting, where accounting is stored as an array of percpu > > counters that are cheap to read, but not persistent. Updates to in > > memory accounting are propagated from the transaction commit path." > > > > ... but NBD, and feel free to reword, drop and/or correct any of that > > text. > > It's there because bch2_accounting_mem_read() takes a bpos when it > should be a disk_accounting_key. I'll fix that if I can... > > > > + * Cheap to read, but non persistent. > > > + * > > > + * To do a disk accounting update: > > > + * - initialize a disk_accounting_key, to specify which counter is being > > > update > > > + * - initialize counter deltas, as an array of 1-3 s64s > > > + * - call bch2_disk_accounting_mod() > > > + * > > > + * This queues up the accounting update to be done at transaction commit > > > time. > > > + * Underneath, it's a normal btree write buffer update. > > > + * > > > + * The transaction commit path is responsible for propagating updates to > > > the in > > > + * memory counters, with bch2_accounting_mem_mod(). > > > + * > > > + * The commit path also assigns every disk accounting update a unique > > > version > > > + * number, based on the journal sequence number and offset within that > > > journal > > > + * buffer; this is used by journal replay to determine which updates > > > have been > > > + * done. > > > + * > > > + * The transaction commit path also ensures that replicas entry > > > accounting > > > + * updates are properly marked in the superblock (so that we know > > > whether we can > > > + * mount without data being unavailable); it will update the superblock > > > if > > > + * bch2_accounting_mem_mod() tells it to. > > > > I'm not really sure what this last paragraph is telling me, but granted > > I've not got that far into the code yet either. > > yeah that's for a whole different subsystem that happens to be slaved to > the accounting - replicas.c, which also used to help out quite a bit > with the accounting but now it's pretty much just for managing the > superblock replicas section. > > The superblock replicas section is just a list of entries, where each > entry is a list of devices - "there is replicated data present on this > set of devices". We also have full counters of how much data is present > replicated across each set of devices, so the superblock section is just > a truncated version of the accounting - "data exists on these devices", > instead of saying how much. >
