Re: [DISCUSS] Statistics maintenance in FoundationDB

Paul Davis Tue, 09 Apr 2019 08:12:54 -0700

I've only got two notes for color.

I'm pretty sure that keeping the update_seq as a key could be fine
since its an atomic op underneath and shouldn't conflict. However
given that we're looking to store an Incarnation and Batch Id with
every version stamp I still think it makes better sense to read from
the end of the changes feed as that means we're only doing the update
logic in a single place.


For the offset calculation I'll just mention that its the same
scenario as custom JS reduces in that we need to be able to calculate
some value over and arbitrary range of keys. For custom JS reduces I
could see having the complexity (if we go that route) however for
offset its not very useful. Especially given fdb transaction
limitations which means its not necessarily valid any time we have to
use multiple transactions to satisfy a read from the index.

On Tue, Apr 9, 2019 at 3:12 AM Robert Newson <rnew...@apache.org> wrote:
>
> Hi,
>
> I agree with all of this.
>
> On "sizes", we should clean up the various places that the different sizes 
> are reported. I suggest we stick with just the "sizes" object, which will 
> have two items, 'external' which will be jiffy's estimate of the body as json 
> plus the length of all attachments (only if held within fdb) and 'file' which 
> will be the sum of the lengths of the keys and values in fdb for the 
> Directory (excluding the sum key/value itself). (the long way of saying I 
> agree with what you already said).
>
> On "offset", I agree we should remove it. It's of questionable value today, 
> so let's call it out as an API change in the appropriate RFC section. The fdb 
> release (ostensibly "4.0") is an opportunity to clean up some API cruft. 
> Given we know about this one early, we should also remove it in 3.0.
>
> --
>   Robert Samuel Newson
>   rnew...@apache.org
>
> On Mon, 8 Apr 2019, at 23:33, Adam Kocoloski wrote:
> > Hi all, a recent comment from Paul on the revision model RFC reminded
> > me that we should have a discussion on how we maintain aggregate
> > statistics about databases stored in FoundationDB. I’ll ignore the
> > statistics associated with secondary indexes for the moment, assuming
> > that the design we put in place for document data can serve as the
> > basis for an extension there.
> >
> > The first class of statistics are the ones we report in GET /<dbname>,
> > which are documented here:
> >
> > http://docs.couchdb.org/en/stable/api/database/common.html#get--db
> >
> > These fall into a few different classes:
> >
> > doc_count, doc_del_count: these should be maintained using
> > FoundationDB’s atomic operations. The revision model RFC enumerated all
> > the possible update paths and showed that we always have enough
> > information to know whether to increment or decrement each of these
> > counters; i.e., we always know when we’re removing the last
> > deleted=false branch, adding a new branch to a previously-deleted
> > document, etc.
> >
> > update_seq: this must _not_ be maintained as its own key; attempting to
> > do so would cause every write to the database to conflict with every
> > other write and kill throughput. Rather, we can do a limit=1 range read
> > on the end of the ?CHANGES space to retrieve the current sequence of
> > the database.
> >
> > sizes.*: things get a little weird here. Historically we relied on the
> > relationship between sizes.active and sizes.file to know when to
> > trigger a database compaction, but we don’t yet have a need for
> > compaction in the FDB-based data model and it’s not clear how we should
> > define these two quantities. The sizes.external field has also been a
> > little fuzzy. Ignoring the various definitions of “size” for the
> > moment, let’s agree that we’ll want to be tracking some set of byte
> > counts for each database. I think the way we should do this is by
> > extending the information stored in each edit branch in ?REVISIONS to
> > included the size(s) of the current revision. When we update a document
> > we need to compare the size(s) of the new revision with the size(s) of
> > the parent, and update the database level atomic counter(s)
> > appropriately. This requires an enhancement to RFC 001.
> >
> > I’d like to further propose that we track byte counts not just at a
> > database level but also across the entire Directory associated with a
> > single CouchDB deployment, so that FoundationDB administrators managing
> > multiple applications for a single cluster can have a better view of
> > per-Directory resource utilization without walking every single
> > database stored inside.
> >
> > Looking past the DB info endpoint, one other statistic worth discussing
> > is the “offset” field included with every response to an _all_docs
> > request. This is not something that we get for free in FoundationDB,
> > and I have to confess it seems to be of limited utility. We could
> > support this by implementing a tree structure by adding additional
> > aggregation keys on top of the keys stored in the _all_docs space, but
> > I’m skeptical that it’s worth baking this extra cost into every
> > database update and _all_docs operation. I’d like to hear others’
> > thoughts on this one.
> >
> > I haven’t yet looked closely at _stats and _system to see if any of
> > those metrics require specific support from FDB.
> >
> > Adam

Re: [DISCUSS] Statistics maintenance in FoundationDB

Reply via email to