Look like we can move the sample_rate out from the plan. As you said, sample_rate at the global level is not so helpful. Maybe we need to avoid doing it prematurely until we have more experience in this area.
Zexuan Luo <[email protected]> 于2021年10月30日周六 上午10:39写道: > > > The reason I say so is that let's assume there are multiple routes & > services which as a whole on a bigger picture indicates a certain product > (Product A). Adding custom tags ({"product": "A"}) at metric level makes it > easy for the end-user to query how the whole product is doing as a whole in > this case. > > Yes. A custom tag could be helpful. The only concern is that I think > adding it now may be premature. As I said, > > > 3. As we don't allow users to select the metrics (at least for the > early version), it would be better to put the tags & sample rate out > of metric level. > > As we don't have a metric-level configuration in the early version, > there is no place to put metric-level tags. Adding a constant tag at > the global level may be not so helpful. Maybe we can do it in the > future. > > > Also, let's assume there are two services, S-A & S-B where S-A gets huge > amount of load (huge num of requests - lots of metrics) and it's a stable > service, so in that case, sample_rate of 100% for that service may not be > ideal for the user (maybe 80% is sufficient) compared to the sample_rate:1 > for S_B. (Remember: we don't have to work extra to tackle the sample_rate, > everything will be handled by DogStatsD). > > If I didn't make it wrong, DogStatsD handles the sample_rate at the > metric level. The sample_rate is configured by metric, there is no > service-level sample_rate. So we can't configure it per service. > > Bisakh Mondal <[email protected]> 于2021年10月29日周五 下午8:31写道: > > > > Thanks for the thoughtful reply, Spacewander. > > So to make it easy for the user we are going to provide every possible > > metrics if the plugin is enabled - sounds really good. > > > > However, on point 3, you have suggested using `sample_rate` & `tags` > > outside of the metric level. IMHO, I think that keeping it inside > > schema would be a handy option for the users. > > The reason I say so is that let's assume there are multiple routes & > > services which as a whole on a bigger picture indicates a certain product > > (Product A). Adding custom tags ({"product": "A"}) at metric level makes it > > easy for the end-user to query how the whole product is doing as a whole in > > this case. (Just one example) > > Also, let's assume there are two services, S-A & S-B where S-A gets huge > > amount of load (huge num of requests - lots of metrics) and it's a stable > > service, so in that case, sample_rate of 100% for that service may not be > > ideal for the user (maybe 80% is sufficient) compared to the sample_rate:1 > > for S_B. (Remember: we don't have to work extra to tackle the sample_rate, > > everything will be handled by DogStatsD). So I think providing the feature > > is optimal. > > > > Please let me know, what you think. Thank you : ) > > > > Best regards, > > Bisakh. > > > > On Fri, 29 Oct 2021 at 16:41, Zexuan Luo <[email protected]> wrote: > > > > > I missed one. The sample_rate is also not route-specific, so we can > > > put it into the metadata. > > > > > > Zexuan Luo <[email protected]> 于2021年10月29日周五 下午6:01写道: > > > > > > > > LGTM. Only a few questions: > > > > 1. Since the available metrics are builtin in the APISIX, I think we > > > > don't need to allow users to configure the metric name. Even they > > > > configure a "request_blah", there won't be a request_blah. > > > > 2. dogstatsd already has a namespace concept, see > > > > https://docs.datadoghq.com/developers/dogstatsd/?tab=hostagent, so we > > > > can use namespace instead of self-invented prefix. > > > > 3. As we don't allow users to select the metrics (at least for the > > > > early version), it would be better to put the tags & sample rate out > > > > of metric level. > > > > 4. As the host and other stuff are not route-specific, we can put them > > > > into metadata configuration. > > > > > > > > Here is the configuration I suggest: > > > > ``` > > > > "datadog": { > > > > "sample_rate": 1.0 > > > > } > > > > ``` > > > > > > > > ``` > > > > # metadata > > > > "datadog": { > > > > "host": "192.168.47.149", > > > > "port": 8125, > > > > "namespace": "apisix.dev" > > > > } > > > > ``` > > > > > > > > As for the scope of metrics, we can refactor the Prometheus code, let > > > > them share the same mechanism to collect request-level data. The type > > > > of Prometheus metrics will be the stat_type, and the labels will be > > > > the tag. > > > > > > > > Bisakh Mondal <[email protected]> 于2021年10月29日周五 下午2:52写道: > > > > > > > > > > Hi Community, > > > > > > > > > > Metrics reflects the real-time usage or behaviour of a system. This > > > > > proposal proposes to incorporate a plugin for Datadog, a widely used > > > > > observability solution, into Apache APISIX. End users could use the > > > plugin > > > > > by setting up a Datadog agent themselves and providing the DogStatsD > > > IP and > > > > > port address as a plugin conf. DogStatsD that comes bundled with the > > > > > DataDog agent, is an implementation of Statsd protocol where our > > > > > application (APISIX) will send different metrics for different events > > > over > > > > > the UDP socket. > > > > > > > > > > > > > > > Metrics that we are going to support [metric_name]: Following metrics > > > will > > > > > be logged (if enabled) to the Datadog server. > > > > > > > > > > - request_count: Tracks number of requests for that > > > > > service/route/consumer. [TYPE: COUNT] > > > > > - latency > > > > > - upstream_latency > > > > > - request_body_size (in Bytes) > > > > > - response_body_size (in Bytes) > > > > > > > > > > [Feel free to suggest new ones.] > > > > > > > > > > > > > > > As additional info, we are also going to log { "route_name":"name of > > > the > > > > > route (if any)`, "uri": "request URI (redundant, should we keep it > > > > > ?)", > > > > > "service_name":"name of the service (if any)", "consumer_id: "id", > > > > > "status_code": "response code (HTTP/grpc)" } with additional tags. > > > > > > > > > > > > > > > Plugin-name: "datadog" > > > > > > > > > > > > > > > So the configs, that could be used while enabling the plugin are > > > > > > > > > > - host: DogStatsD agent host (default: 0.0.0.0) > > > > > - port: DogStatsD agent port (default: 8125) > > > > > - metrics: type (list) [ > > > > > > > > > > { > > > > > > > > > > - name: identifier > > > > > - metric_type: Name of the enabled metric. One of [request_count, > > > > > latency ... ] > > > > > - stat_type: one of [COUNT | GAUGE | SET | HISTOGRAM | > > > > > DISTRIBUTION] > > > > > - sample_rate: float Optional (valid for COUNT, GAUGE) > > > > > - tags: additional static tags {} > > > > > - prefix: string (default apisix) > > > > > - namespace: string [Should we add it ?] > > > > > > > > > > // The metric name logged in DogStatsd will be ( > > > > > prefix.metric_type.name.stat_type) > > > > > > > > > > }, > > > > > > > > > > {}, {} ... > > > > > > > > > > ] > > > > > > > > > > > > > > > As we already have clean segregation of plugin selection priority > > > > > [consumer> route> service], depending upon where it is enabled the > > > logging > > > > > will be particular to that entity only. If it is enabled globally, > > > every > > > > > request will be logged. > > > > > > > > > > > > > > > An example route with datadog plugin enabled will look like this > > > > > > > > > > > > > > > ``` > > > > > > > > > > { > > > > > "uri": "/index.html", > > > > > "name": "datadog-plugin-route", > > > > > "plugins": { > > > > > "datadog": { > > > > > "host": "192.168.47.149", > > > > > "port": 8125, > > > > > "metrics": [ > > > > > { > > > > > "name": "index_page", > > > > > "metric_type": "request_count", > > > > > "stat_type": "count", > > > > > "prefix": "apisix.mycompany", > > > > > "namespace": "default", > > > > > "sample_rate": 1.0, > > > > > "tags": { > > > > > "mytag1":"val1", > > > > > "mytag2":"val2" > > > > > } > > > > > }, > > > > > {...} > > > > > ] > > > > > } > > > > > }, > > > > > "upstream": { > > > > > "type": "roundrobin", > > > > > "nodes": { > > > > > "39.97.63.215:80": 1 > > > > > } > > > > > } > > > > > } > > > > > > > > > > ``` > > > > > > > > > > > > > > > I am going to work on them starting with "request_count" and one by > > > > > one > > > > > gradually. > > > > > > > > > > Please let me know if you have any suggestions, improvements, > > > > > modifications. I'll be happy to incorporate them. > > > > > > > > > > > > > > > Thank you! > > > > > > > > > > > > > > > Best Regards, > > > > > > > > > > Bisakh <https://github.com/bisakhmondal> > > >
