Thanks Jan for your thoughtful thinking and suggestion. Definitely, these will be considered.
1. we use one endpoint to consolidate _system, _stats and _active_tasks. 2. I tend to use /_metrics for that because this can be directly consumed by Prometheus. 3. The question about per-node and per-cluster, it is easy for _system and _stats because there is existing mapping per node. For _active_tasks, my investigation is that we provide node information for “indexer” and “replication” tasks. We can use this to provide response per node. Of course, if there is new task type introduced later, we can re-visit this. 4. Looks that the format of response is one hot topic. I like to give response with JSON by default (if there is no the “Accept” header or query parameter). Also, give response with Prometheus format if having header or query parameter to request this. We will have more proposal after experiments. Thanks to all, Peng Hui > On Sep 24, 2020, at 2:55 AM, Jan Lehnardt <j...@apache.org> wrote: > > > >> On 23. Sep 2020, at 18:25, Richard Ellis <ricel...@uk.ibm.com> wrote: >> >>> so we should absolutely make this info available in JSON >> >> This sounds like a good idea to me >> >>> we could fall back to a ?accept=prometheus option >> >> I'm opposed to adding endpoints that supply different content-type >> responses via non-standard means. The CouchDB API has some examples of >> this through history and it can make using those endpoints with standard >> tooling somewhat painful. > > I don’t suggest making the non-standard approach the only implementation, > we should also accept an Accept header and leave the query string option > as an escape hatch for envs that can’t header. > > >> >> A bit of quick searching seems to suggest that the format has its own >> project https://openmetrics.io/ - and this declares it's text >> representation linking back to >> https://prometheus.io/docs/instrumenting/exposition_formats/#text-based-format >> >> which declares a Content-Type of "text/plain; version=0.0.4" - so >> defaulting to that, but following Joan's suggestion and switching to JSON >> for a supplied Accept:application/json in the standard way seems a like >> good choice to me. >> >> Rich >> >> >> >> From: Jan Lehnardt <j...@apache.org> >> To: dev@couchdb.apache.org >> Cc: "Gesellchen, Tobias" <tobias.gesellc...@europace.de> >> Date: 23/09/2020 16:42 >> Subject: [EXTERNAL] Re: [DISCUSS] Prometheus endpoint in CouchDB >> 4.x >> >> >> >> Hi all, >> >> a few things to consider: >> >> 1. The idea of unifying our “get runtime info about CouchDB” endpoints >> into one is solid, as it is always weird to make sure you know which info >> you get where. We see this specifically in support engagements, where it >> is always awkward to ask for the results of multiple endpoints. >> >> 2. This directly leads to the question about what the endpoint should be >> called. I feel if it is a new endpoint, we should give it a new name. >> _info maybe, but feel free to bike shed away. >> >> 3. Next the question about per-node and per-cluster info/metrics/activity >> on the endpoint. It might be convenient to be able to ask any one node >> about what is going on in the entire cluster, rather than any one node, >> but some stats only make sense in the context of a single node. Maybe the >> result includes everything separated by node somehow. >> >> 4. Then the format: if this wasn’t about Prometheus and its custom format, >> we wouldn’t discuss any of this and just use JSON. Since we *do* want to >> target Prometheus with this, we have to talk about the format. Any of the >> above is useful for non-Prometheus consumers, so we should absolutely make >> this info available in JSON. And we can *also* send it in the Prometheus >> format. The “correct” HTTP-way of doing this would be to use the Accept >> header on the new endpoint, as Joan points out, but that’s often not an >> option, so we could fall back to a ?accept=prometheus option. This would >> also leave us open to add more formats in the future, as new standards >> arise. >> >> 5. That leads us to whether we want to do this. Every five or so years, >> new standards for these types of systems arise, and sometimes it is worth >> incorporating them (like we finally do with the SystemD compatible log >> formatter) and sometimes it is not and folks write tools to convert from >> our HTTP/JSON standard to whatever they need ( >> https://github.com/gesellix/couchdb-prometheus-exporter >> ) >> >> 6. We could also just bundle this exporter (although it is written in Go, >> which we currently don’t have as a dependency. >> >> * * * >> >> Personally, I think the Prometheus format is widely enough used to warrant >> inclusion, as long as we do it tastefully. I think a new endpoint with an >> additional ?accept= or similar URL-level override for the format would be >> a pragmatic, if not entirely *neat* approach. If we can build this all in >> Erlang, the better, if we wanna shortcut dev time and bundle the Go >> project, I might be more hesitant. On the per-node-or-per-cluster >> question, I don’t know enough about the Prometheus format and whether it >> allows us to send the equivalent of {nodes: { “node1”: {…}, “node2”: {…}, >> “node3”: {…} }}, or whether it demands per-node output, in which case >> _active_tasks might get a bit awkward. >> >> Best >> Jan >> >> — >> Professional Support for Apache CouchDB: >> https://neighbourhood.ie/couchdb-support/ >> >> >> 24/7 Observation for your CouchDB Instances: >> https://opservatory.app >> >> >>> On 22. Sep 2020, at 14:55, jiangph <jiangpeng...@hotmail.com> wrote: >>> >>> Hey all, >>> >>> We would like to add a Prometheus metrics endpoint for CouchDB and >> wanted to see if the community would be interested in us contributing this >> to CouchDB 4.x. >>> >>> Prometheus is a CNCF open-source project and the Prometheus metrics >> endpoint format is supported by many monitoring tools. Its data model is >> based around having a metric name which then contains a label name and a >> label value: >>> >>> <metric name>{<label name>=<label value>, ...} >>> >>> And it supports the Counter, Gauge, Histogram, and Summary metric types. >> >>> >>> The idea for the new Prometheus endpoint, /_metrics, would be that the >> endpoint is a consolidation of the _stats [1], _system [2], and >> _active_tasks [3] endpoints. >>> >>> For _stats and _system, the conversion from JSON to Prometheus-based >> format seems to be straightforward. >>> >>> JSON format: >>> { >>> "value": { >>> "min": 0, >>> "max": 0, >>> "arithmetic_mean": 0, >>> "geometric_mean": 0, >>> "harmonic_mean": 0, >>> "median": 0, >>> "variance": 0, >>> "standard_deviation": 0, >>> ... >>> "percentile": [ >>> [ >>> 50, >>> 0 >>> ], >>> [ >>> 75, >>> 0 >>> ], >>> [ >>> 90, >>> 0 >>> ], >>> [ >>> 95, >>> 0 >>> ], >>> [ >>> 99, >>> 0 >>> ], >>> [ >>> 999, >>> 0 >>> ] >>> ], >>> "histogram": [ >>> [ >>> 0, >>> 0 >>> ] >>> ], >>> } >>> >>> Prometheus-based format: >>> >>> couchdb_stats{value="min"} 0 >>> couchdb_stats{value="max"} 0 >>> couchdb_stats{value="percentile50"} 0 >>> couchdb_stats{value="percentile75"} 0 >>> couchdb_stats{value="percentile95"} 0 >>> >>> For _active_tasks, the change will be a bit more complicated, and some >> fields will be added to labels and tags. >>> >>> JSON format: >>> >>> { >>> "checkpointed_source_seq": 68585, >>> "continuous": false, >>> "doc_id": null, >>> "doc_write_failures": 0, >>> "docs_read": 4524, >>> "docs_written": 4524, >>> "missing_revisions_found": 4524, >>> "pid": "<0.1538.5>", >>> "progress": 44, >>> "replication_id": "9bc1727d74d49d9e157e260bb8bbd1d5", >>> "revisions_checked": 4524, >>> "source": "mailbox", >>> "source_seq": 154419, >>> "started_on": 1376116644, >>> "target": " >> http://mailsrv:5984/mailbox >> < >> http://mailsrv:5984/mailbox >>> ", >>> "type": "replication", >>> "updated_on": 1376116651 >>> } >>> >>> Prometheus-based would look something like: >>> >>> format:couchdb_active_task{type="replication", source="mailbox", >> target=" >> http://mailsrv:5984/mailbox >> < >> http://mailsrv:5984/mailbox >>> ", docs_count = "docs_read"} 4524 >>> couchdb_active_task{type="replication", source="mailbox", target=" >> http://mailsrv:5984/mailbox >> < >> http://mailsrv:5984/mailbox >>> ", docs_count = "docs_written"} 4524 >>> couchdb_active_task{type="replication", source="mailbox", target=" >> http://mailsrv:5984/mailbox >> < >> http://mailsrv:5984/mailbox >>> ", docs_count = "missing_revisions_found"} 4524 >>> >>> >>> Best regards, >>> Garren Smith >>> Peng Hui Jiang >>> >>> [1] >> https://docs.couchdb.org/en/latest/api/server/common.html#node-node-name-stats >> >> < >> https://docs.couchdb.org/en/latest/api/server/common.html#node-node-name-stats >> >>> >>> [2] >> https://docs.couchdb.org/en/latest/api/server/common.html#active-tasks >> < >> https://docs.couchdb.org/en/latest/api/server/common.html#active-tasks >>> >>> [3] >> https://docs.couchdb.org/en/latest/api/server/common.html#node-node-name-system >> >> < >> https://docs.couchdb.org/en/latest/api/server/common.html#node-node-name-system >> >>> >> >> >> >> >> >> Unless stated otherwise above: >> IBM United Kingdom Limited - Registered in England and Wales with number >> 741598. >> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU >> >