Re: [Proposal ] Datadog Plugin into Apache APISIX for metrics collection

Zexuan Luo Fri, 29 Oct 2021 19:46:24 -0700

Look like we can move the sample_rate out from the plan. As you said,
sample_rate at the global level is not so helpful. Maybe we need to
avoid doing it prematurely until we have more experience in this area.


Zexuan Luo <[email protected]> 于2021年10月30日周六 上午10:39写道：
>
> > The reason I say so is that let's assume there are multiple routes &
> services which as a whole on a bigger picture indicates a certain product
> (Product A). Adding custom tags ({"product": "A"}) at metric level makes it
> easy for the end-user to query how the whole product is doing as a whole in
> this case.
>
> Yes. A custom tag could be helpful. The only concern is that I think
> adding it now may be premature. As I said,
>
> > 3. As we don't allow users to select the metrics (at least for the
> early version), it would be better to put the tags & sample rate out
> of metric level.
>
> As we don't have a metric-level configuration in the early version,
> there is no place to put metric-level tags. Adding a constant tag at
> the global level may be not so helpful. Maybe we can do it in the
> future.
>
> > Also, let's assume there are two services, S-A & S-B where S-A gets huge
> amount of load (huge num of requests - lots of metrics) and it's a stable
> service, so in that case, sample_rate of 100% for that service may not be
> ideal for the user (maybe 80% is sufficient) compared to the sample_rate:1
> for S_B. (Remember: we don't have to work extra to tackle the sample_rate,
> everything will be handled by DogStatsD).
>
> If I didn't make it wrong, DogStatsD handles the sample_rate at the
> metric level. The sample_rate is configured by metric, there is no
> service-level sample_rate. So we can't configure it per service.
>
> Bisakh Mondal <[email protected]> 于2021年10月29日周五 下午8:31写道：
> >
> > Thanks for the thoughtful reply, Spacewander.
> > So to make it easy for the user we are going to provide every possible
> > metrics if the plugin is enabled - sounds really good.
> >
> > However, on point 3, you have suggested using `sample_rate` & `tags`
> > outside of the metric level. IMHO, I think that keeping it inside
> > schema would be a handy option for the users.
> > The reason I say so is that let's assume there are multiple routes &
> > services which as a whole on a bigger picture indicates a certain product
> > (Product A). Adding custom tags ({"product": "A"}) at metric level makes it
> > easy for the end-user to query how the whole product is doing as a whole in
> > this case. (Just one example)
> > Also, let's assume there are two services, S-A & S-B where S-A gets huge
> > amount of load (huge num of requests - lots of metrics) and it's a stable
> > service, so in that case, sample_rate of 100% for that service may not be
> > ideal for the user (maybe 80% is sufficient) compared to the sample_rate:1
> > for S_B. (Remember: we don't have to work extra to tackle the sample_rate,
> > everything will be handled by DogStatsD). So I think providing the feature
> > is optimal.
> >
> > Please let me know, what you think. Thank you : )
> >
> > Best regards,
> > Bisakh.
> >
> > On Fri, 29 Oct 2021 at 16:41, Zexuan Luo <[email protected]> wrote:
> >
> > > I missed one. The sample_rate is also not route-specific, so we can
> > > put it into the metadata.
> > >
> > > Zexuan Luo <[email protected]> 于2021年10月29日周五 下午6:01写道：
> > > >
> > > > LGTM. Only a few questions:
> > > > 1. Since the available metrics are builtin in the APISIX, I think we
> > > > don't need to allow users to configure the metric name. Even they
> > > > configure a "request_blah", there won't be a request_blah.
> > > > 2. dogstatsd already has a namespace concept, see
> > > > https://docs.datadoghq.com/developers/dogstatsd/?tab=hostagent, so we
> > > > can use namespace instead of self-invented prefix.
> > > > 3. As we don't allow users to select the metrics (at least for the
> > > > early version), it would be better to put the tags & sample rate out
> > > > of metric level.
> > > > 4. As the host and other stuff are not route-specific, we can put them
> > > > into metadata configuration.
> > > >
> > > > Here is the configuration I suggest:
> > > > ```
> > > > "datadog": {
> > > >             "sample_rate": 1.0
> > > >  }
> > > > ```
> > > >
> > > > ```
> > > > # metadata
> > > > "datadog": {
> > > >             "host": "192.168.47.149",
> > > >             "port": 8125,
> > > >             "namespace": "apisix.dev"
> > > >  }
> > > > ```
> > > >
> > > > As for the scope of metrics, we can refactor the Prometheus code, let
> > > > them share the same mechanism to collect request-level data. The type
> > > > of Prometheus metrics will be the stat_type, and the labels will be
> > > > the tag.
> > > >
> > > > Bisakh Mondal <[email protected]> 于2021年10月29日周五 下午2:52写道：
> > > > >
> > > > > Hi Community,
> > > > >
> > > > >    Metrics reflects the real-time usage or behaviour of a system. This
> > > > > proposal proposes to incorporate a plugin for Datadog, a widely used
> > > > > observability solution, into Apache APISIX. End users could use the
> > > plugin
> > > > > by setting up a Datadog agent themselves and providing the DogStatsD
> > > IP and
> > > > > port address as a plugin conf. DogStatsD that comes bundled with the
> > > > > DataDog agent, is an implementation of Statsd protocol where our
> > > > > application (APISIX) will send different metrics for different events
> > > over
> > > > > the UDP socket.
> > > > >
> > > > >
> > > > > Metrics that we are going to support [metric_name]: Following metrics
> > > will
> > > > > be logged (if enabled) to the Datadog server.
> > > > >
> > > > >    - request_count: Tracks number of requests for that
> > > > >    service/route/consumer. [TYPE: COUNT]
> > > > >    - latency
> > > > >    - upstream_latency
> > > > >    - request_body_size (in Bytes)
> > > > >    - response_body_size (in Bytes)
> > > > >
> > > > > [Feel free to suggest new ones.]
> > > > >
> > > > >
> > > > > As additional info, we are also going to log { "route_name":"name of
> > > the
> > > > > route (if any)`, "uri": "request URI (redundant, should we keep it 
> > > > > ?)",
> > > > > "service_name":"name of the service (if any)", "consumer_id: "id",
> > > > > "status_code": "response code (HTTP/grpc)" } with additional tags.
> > > > >
> > > > >
> > > > > Plugin-name: "datadog"
> > > > >
> > > > >
> > > > > So the configs, that could be used while enabling the plugin are
> > > > >
> > > > >    - host: DogStatsD agent host (default: 0.0.0.0)
> > > > >    - port: DogStatsD agent port (default: 8125)
> > > > >    - metrics: type (list) [
> > > > >
> > > > >     {
> > > > >
> > > > >    - name: identifier
> > > > >    - metric_type: Name of the enabled metric. One of [request_count,
> > > > >    latency ... ]
> > > > >    - stat_type: one of [COUNT | GAUGE | SET | HISTOGRAM | 
> > > > > DISTRIBUTION]
> > > > >    - sample_rate: float Optional (valid for COUNT, GAUGE)
> > > > >    - tags: additional static tags {}
> > > > >    - prefix: string (default apisix)
> > > > >    - namespace: string [Should we add it ?]
> > > > >
> > > > >        // The metric name logged in DogStatsd will be (
> > > > > prefix.metric_type.name.stat_type)
> > > > >
> > > > >     },
> > > > >
> > > > >     {}, {} ...
> > > > >
> > > > >   ]
> > > > >
> > > > >
> > > > > As we already have clean segregation of plugin selection priority
> > > > > [consumer> route> service], depending upon where it is enabled the
> > > logging
> > > > > will be particular to that entity only. If it is enabled globally,
> > > every
> > > > > request will be logged.
> > > > >
> > > > >
> > > > > An example route with datadog plugin enabled will look like this
> > > > >
> > > > >
> > > > > ```
> > > > >
> > > > > {
> > > > >     "uri": "/index.html",
> > > > >     "name": "datadog-plugin-route",
> > > > >     "plugins": {
> > > > >         "datadog": {
> > > > >             "host": "192.168.47.149",
> > > > >             "port": 8125,
> > > > >             "metrics": [
> > > > >                 {
> > > > >                     "name": "index_page",
> > > > >                     "metric_type": "request_count",
> > > > >                     "stat_type": "count",
> > > > >                     "prefix": "apisix.mycompany",
> > > > >                     "namespace": "default",
> > > > >                     "sample_rate": 1.0,
> > > > >                     "tags": {
> > > > >                         "mytag1":"val1",
> > > > >                         "mytag2":"val2"
> > > > >                     }
> > > > >                 },
> > > > >                 {...}
> > > > >             ]
> > > > >         }
> > > > >     },
> > > > >     "upstream": {
> > > > >         "type": "roundrobin",
> > > > >         "nodes": {
> > > > >             "39.97.63.215:80": 1
> > > > >         }
> > > > >     }
> > > > > }
> > > > >
> > > > > ```
> > > > >
> > > > >
> > > > > I am going to work on them starting with "request_count" and one by 
> > > > > one
> > > > > gradually.
> > > > >
> > > > > Please let me know if you have any suggestions, improvements,
> > > > > modifications. I'll be happy to incorporate them.
> > > > >
> > > > >
> > > > > Thank you!
> > > > >
> > > > >
> > > > > Best Regards,
> > > > >
> > > > > Bisakh <https://github.com/bisakhmondal>
> > >

Re: [Proposal ] Datadog Plugin into Apache APISIX for metrics collection

Reply via email to