Re: Statistics Module

Jan Lehnardt Sat, 31 Jan 2009 08:56:04 -0800


On 31 Jan 2009, at 02:12, Paul Davis wrote:


I find fault with the logic that long names in any way lead to
success. The two things that this style does offer is the focus on
small independent tests and a hierarchical categorization of tests.

Just pulling out the first section:

 stats: function(debug) {
   var open_databases_tests = {
     'should increment the number of open databases when creating a
db': function...
     'should increment the number of open databases when opening a
db': function...

'should decrement the number of open databases when deleting':function...

     'should keep the same number of open databases when reaching the
max_dbs_open limit': function...
     'should return 0 for number of open databases after call to
restartServer()': function....

Contrast that with:

stats: {
   database: {
       create_inc: function() .....
       open_inc: function() ...
       delete_dec: function() ...
       max_open: function() ...
       reset: function() ....
   }
}

Sorry, but if if you read "test 'stats -> database -> create_inc'failed",

you don't know what is going on and have to dig into test code. If
you read test "test 'should increment open_databases counter' failed".

The idea is to have natural language identifiers here.

Everything in your source code should be trying to convey meaning as
quickly and efficiently as possible. All these do is add redundant
information and make the tests a pain to read.


"A pain" as in "for a human", your proposal is sure easy to parse for a
computer (just teasing here :-).

I think we kicked off at the extreme end of a spectrum and you're
advocating the other one. I'm sure we'll meet on middle ground :)


Cheers
Jan
--

I definitely agree that we could spend some time organizing our test
suite into smaller files. Perhaps one file per current test, and then
split up some of the bigger tests into smaller bits. eunit tests for
the internals is also made of win. I am in no way against stealing
good ideas, but long names just isn't one of them.


HTH,
Paul Davis
Cheers
Jan
--
Also: Fucking.
On Thu, Jan 29, 2009 at 5:42 PM, Jan Lehnardt <[email protected]>wrote:
Hi list,

Alex Lang and I have been working on a statistics module this week.
We'd like to share our intermediate results and discuss furthersteps
and acceptance of the module into the CouchDB codebase. The code
lives on GitHub:

http://github.com/janl/couchdb/tree/stats
A full diff against trunk from a few hours ago lives onFriendpaste:
http://friendpaste.com/3IfIyRv5EzXEnPyk7S9xqe

Be gentle :)
The driving idea was to make it easy for 3rd party monitoringsolutionsto pick up runtime statistics of a single CouchDB node. The statsmoduledoes not collect long-term stats persistently. It does not drawpretty
graphs
and it does not come with a pony.
The stats module is rather simple, it consists of two Erlangmodules,a collector and an aggregator. The collector is a `gen_server`holdingand `ets` table that holds integer values associated with keys.For now
these are simple counters. All CouchDB modules can send a message
to the collector with any key they want to get a metric counted.E.g.
`couch_httpd`:

couch_stats_collector:increment({httpd, requests}).

to count the total number of requests made.

We plan to add support for minimum and maximum values of
different metrics soon.

The aggregator has two jobs: It exposes the raw counter values from
the collector to the HTTP plugin (`couch_httpd_stats_handlers.erl`)
and to aggregate counter values to more meaningful numbers like
average requests per second over 60 seconds, 300 seconds, 900
seconds (cron-style).

Reading values is done through a single API endpoint `/_stats`. The
canonical format is `/_stats/modulename/statkey` e.g.:

/_stats/httpd/requests

will return a JSON string:

{"httpd": {"requests":3241}}

We'll have a combined view for all stats under `GET /_stats`.


A few notes:

- Stats URLs are read-only.
- This is a first iteration, we can add as many fancy stats andmetrics
as
we like, we went with the ones that felt most useful for us.
- The `{module, key}` identifiers for stats are not yetfinalized, we
realized
early on that we'd like to have a fair number of stats beforelooking
into a good naming scheme. We don't have a good naming scheme
yet, recommendations are very welcome.

- We'll add more inline-comments to key parts of the code.
- We're not yet fully integrated into the build system (seebelow). eunitoffers a variety of options to conditionally compile code fortesting
and
moving test code to separate files if that is preferred.
- The way we calculate averages works, but is a bit clunky andcan be
improved. Also, different kinds of averages can be added.
- Stats are lost on server shutdown or when restartServer(); iscalled
wich
the JavaScript test suite does.
- Some lines exceed the soft limit of 79 chars. We'll addressthat in the
future, if required.

- Basic testing shows no significant slowdowns so far. Proper load
testing
has not been done yet.

Test Driven
- We took a test driven approach developing this. Alex is anexpert on
TDD (with high success rates, but that's another story). We added
an extensive number of tests to `couch_tests.js` that differ instylefrom the existing tests and are more oriented towards provenpractices.They are also more verbose and try to test only one thing at atime.
We hope you like the new style and we'd like to nudge the devs into
taking over our style. The infrastructure for the new-style testsis not
final and we can still change things around, if needed.

- For things that were hard or impossible to test through the REST
API we added eunit-style tests to our core modules that ensuretheir
functionality. We added a temporary make target `make t` to launch
our tests. You need Erlang/OTP R12B-5 for that or an existing euint
installation. We'd like to work with Noah to integrate that intothe
existing `make test` target (and integrate the existing bits).


Review
We're looking for a validation of our architectural approach aswell asintegrating our `increment` calls into the existing CouchDBcodebase.
Legal
We're releasing the source under the Apache 2.0 license. I (Jan)havea CLA on file, Alex does not. If I remember correctly, my CLA isenoughto get the code into the hands of the ASF. Alex will receivehonourable
mentions in our THANKS file. :)

The stats module has been developed under a contract with the BBC.
They need to be able to integrate CouchDB with their existingmonitoringsolution and we hope this will get them there. The BBC has nointerest
in maintaining the code as a patch and has subsequently allowed us
to release the code as open source under the Apache 2.0 license and
asked if we could offer it to the CouchDB community for potential
inclusion.
By ASF law there still might have to a occur a proper softwaregrant by
the BBC, but take that as a given.


Next Steps

- We'd like to hammer out any architectural issues you might find.

- We'll work on a decent naming scheme for stats.

- We'll clean up code, add tests, and comments.

- We'll add more stats and the missing bits we mentioned earlier.

- We might end up adding SNMP support.

- Once in some shape (and with all legal issues cleared), we'd like
to move the code to an ASF subversion branch and finish integration
into trunk from there. Hopefully before 0.9.


Thanks for your time,

Cheers
Alex & Jan
--
PS from my PMC-hat: I think the stats module is a valuable addition
to CouchDB and I welcome any effort to integrate it.

Re: Statistics Module

Reply via email to