LGTM. Only a few questions:
1. Since the available metrics are builtin in the APISIX, I think we
don't need to allow users to configure the metric name. Even they
configure a "request_blah", there won't be a request_blah.
2. dogstatsd already has a namespace concept, see
https://docs.datadoghq.com/developers/dogstatsd/?tab=hostagent, so we
can use namespace instead of self-invented prefix.
3. As we don't allow users to select the metrics (at least for the
early version), it would be better to put the tags & sample rate out
of metric level.
4. As the host and other stuff are not route-specific, we can put them
into metadata configuration.

Here is the configuration I suggest:
```
"datadog": {
            "sample_rate": 1.0
 }
```

```
# metadata
"datadog": {
            "host": "192.168.47.149",
            "port": 8125,
            "namespace": "apisix.dev"
 }
```

As for the scope of metrics, we can refactor the Prometheus code, let
them share the same mechanism to collect request-level data. The type
of Prometheus metrics will be the stat_type, and the labels will be
the tag.

Bisakh Mondal <[email protected]> 于2021年10月29日周五 下午2:52写道:
>
> Hi Community,
>
>    Metrics reflects the real-time usage or behaviour of a system. This
> proposal proposes to incorporate a plugin for Datadog, a widely used
> observability solution, into Apache APISIX. End users could use the plugin
> by setting up a Datadog agent themselves and providing the DogStatsD IP and
> port address as a plugin conf. DogStatsD that comes bundled with the
> DataDog agent, is an implementation of Statsd protocol where our
> application (APISIX) will send different metrics for different events over
> the UDP socket.
>
>
> Metrics that we are going to support [metric_name]: Following metrics will
> be logged (if enabled) to the Datadog server.
>
>    - request_count: Tracks number of requests for that
>    service/route/consumer. [TYPE: COUNT]
>    - latency
>    - upstream_latency
>    - request_body_size (in Bytes)
>    - response_body_size (in Bytes)
>
> [Feel free to suggest new ones.]
>
>
> As additional info, we are also going to log { "route_name":"name of the
> route (if any)`, "uri": "request URI (redundant, should we keep it ?)",
> "service_name":"name of the service (if any)", "consumer_id: "id",
> "status_code": "response code (HTTP/grpc)" } with additional tags.
>
>
> Plugin-name: "datadog"
>
>
> So the configs, that could be used while enabling the plugin are
>
>    - host: DogStatsD agent host (default: 0.0.0.0)
>    - port: DogStatsD agent port (default: 8125)
>    - metrics: type (list) [
>
>     {
>
>    - name: identifier
>    - metric_type: Name of the enabled metric. One of [request_count,
>    latency ... ]
>    - stat_type: one of [COUNT | GAUGE | SET | HISTOGRAM | DISTRIBUTION]
>    - sample_rate: float Optional (valid for COUNT, GAUGE)
>    - tags: additional static tags {}
>    - prefix: string (default apisix)
>    - namespace: string [Should we add it ?]
>
>        // The metric name logged in DogStatsd will be (
> prefix.metric_type.name.stat_type)
>
>     },
>
>     {}, {} ...
>
>   ]
>
>
> As we already have clean segregation of plugin selection priority
> [consumer> route> service], depending upon where it is enabled the logging
> will be particular to that entity only. If it is enabled globally, every
> request will be logged.
>
>
> An example route with datadog plugin enabled will look like this
>
>
> ```
>
> {
>     "uri": "/index.html",
>     "name": "datadog-plugin-route",
>     "plugins": {
>         "datadog": {
>             "host": "192.168.47.149",
>             "port": 8125,
>             "metrics": [
>                 {
>                     "name": "index_page",
>                     "metric_type": "request_count",
>                     "stat_type": "count",
>                     "prefix": "apisix.mycompany",
>                     "namespace": "default",
>                     "sample_rate": 1.0,
>                     "tags": {
>                         "mytag1":"val1",
>                         "mytag2":"val2"
>                     }
>                 },
>                 {...}
>             ]
>         }
>     },
>     "upstream": {
>         "type": "roundrobin",
>         "nodes": {
>             "39.97.63.215:80": 1
>         }
>     }
> }
>
> ```
>
>
> I am going to work on them starting with "request_count" and one by one
> gradually.
>
> Please let me know if you have any suggestions, improvements,
> modifications. I'll be happy to incorporate them.
>
>
> Thank you!
>
>
> Best Regards,
>
> Bisakh <https://github.com/bisakhmondal>

Reply via email to