Any other ideas for telemetry protocols? If there are none, I will rebase my telemetry patch back to the original version based on statsd.
LZ On Tue, Oct 31, 2017 at 8:33 PM, Lukas Zapletal <[email protected]> wrote: > Hello, > > I am seeking for app instrumenting protocol for Foreman Rails > application that will fulfill the following requirements: > > The protocol must work with multi-process server like Passneger. > The protocol can be easily integrated into Foreman Tasks and Smart Proxy. > The protocol or agent must support aggregation of time-based data > (quantiles, average). > The protocol must integrate with top three open-source monitoring frameworks. > > Let me summarize my findings so far. I am looking for advice or > comments on this topic. I already worked on some prototypes, but > before I commit to some final solution, I want to be sure I will not > miss something I don't know about. > > Before you send comments, please keep in mind I am not searching for > monitoring solution to integrate with. I want an application > instrumentation library (or protocol) to be able export measurements > (or telemetry data if you like) from Rails (like number or requests > processed, SQL queries, time spent in db or view, time spent rendering > a template or calling a backend system). > > > Prometheus > > > Flexible text-based protocol (alternatively protobuf) with HTTP > REST-like communication. It was designed to be pull-based, meaning > that an agent makes HTTP calls to web application which holds all > metrics until they are flushed. It was build for Prometheus monitoring > framework (Apache licenced) created by SoundCloud initially. Server > and most agents are written in Go, can run without external database > or export into 3rd party storage backends. > > > It looks great, but it has a major problem - the Ruby client library > (called client_ruby) does not support multi-process web servers at > all. There are some hacks but these are using local temp files or > shared memory with rather bad benchmark results (see the links down > below). > > > There is a possibility to push metrics into a separate component > called PushGateway, but this was created for things like cron jobs or > rake tasks. Doing multiple HTTP requests for each metric per single > app request will unlikely perform well. In the README authors have > note that this should be considered as "temporary solution". > > > Although Prometheus seems to have vibrant community, the Ruby library > development pace slowed down as SoundCloud "does not use many Ruby > apps anymore". But it is still a good option to have. > > > https://prometheus.io > https://prometheus.io/docs/instrumenting/pushing/ > https://github.com/prometheus/client_ruby > https://github.com/prometheus/client_ruby/issues/9 > https://github.com/prometheus/client_ruby/commits/multiprocess > > > OpenTSDB > > > OpenTSDB consists of a Time Series Daemon (TSD) as well as set of > command line utilities. Interaction with OpenTSDB is primarily > achieved by running one or more of the TSDs. Each TSD is independent. > There is no master, no shared state so you can run as many TSDs as > required to handle any load you throw at it. Each TSD uses the open > source database Hadoop/HBase or hosted Google Bigtable service to > store and retrieve time-series data. > > > It uses push mechanism via REST JSON API with alternative > "telnet-like" text endpoint. Although it does have some agents, it is > more used as a storage backend than end-to-end monitoring solution. > > > http://opentsdb.net/overview.html > > > Statsd > > > Main idea behind this instrumentation protocol is simple - get the > measurement out of the application as fast as possible using UDP > datagram. A collector agent usually runs locally, it does aggregation > and relays the measurements to target backend system. The vanilla > version does not support tagging, but there are extensions or mappings > possible to support that. > > > Almost all monitoring platforms has some kind of > agent/importer/exporter that talks via statsd. The original statsd > daemon was written in Perl years ago, then it was re-popularized by > node.js implementation, but there are many alternative agents from > which the most promising is statsite with very easy extensibility. > > > This protocol is my favourite because it plays well with multiprocess > Ruby servers or other Foreman components (all can just send UDP > packets to localhost) and it also takes all aggregation and storing > temporary data out of Ruby application. It also brings chances of > regressions in our codebase to bare minimum - in the worst case the > aggregating agent can fail but UDP packets will simply get lost > without interrupting the application. The best Ruby client library > seems to be statsd-instrument actively maintained by Shopify, it is > very small without any runtime dependency. > > > https://github.com/etsy/statsd/blob/master/docs/metric_types.md > https://github.com/Shopify/statsd-instrument > https://github.com/prometheus/statsd_exporter > https://github.com/statsite/statsite > https://codeascraft.com/2011/02/15/measure-anything-measure-everything/ > > > New Relic, Instrumental, DataDog, Rollbar > > > All are paid services, some clients are open-source (Instrumental is > MIT licenced) but usually with not well documented protocol and worse > integration to different monitoring solutions. There are plenty of > similar offerings, I might have missed some here. > > > https://newrelic.com > https://instrumentalapp.com > https://instrumentalapp.com/docs/tcp-collector > > > Zabbix, Nagios, Icinga > > > These are more of "alerting" systems (system or service is down) and > they all support application instrumentation to some degree, but it is > not the core of what they do. I have seen them referred as "legacy > monitoring systems", but I think they are still very relevant. They > are not good fit for my use case tho at all. > > > Conclusion > > > To me it looks like the most open and flexible protocol seems to be > statsd. This will give our users the largest flexibility for further > integration - there are plenty of generic agents which can relay data > to backend systems. > > > Comments? > > -- > Later, > Lukas @lzap Zapletal -- Later, Lukas @lzap Zapletal -- You received this message because you are subscribed to the Google Groups "foreman-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
