Hello, I am seeking for app instrumenting protocol for Foreman Rails application that will fulfill the following requirements:
The protocol must work with multi-process server like Passneger. The protocol can be easily integrated into Foreman Tasks and Smart Proxy. The protocol or agent must support aggregation of time-based data (quantiles, average). The protocol must integrate with top three open-source monitoring frameworks. Let me summarize my findings so far. I am looking for advice or comments on this topic. I already worked on some prototypes, but before I commit to some final solution, I want to be sure I will not miss something I don't know about. Before you send comments, please keep in mind I am not searching for monitoring solution to integrate with. I want an application instrumentation library (or protocol) to be able export measurements (or telemetry data if you like) from Rails (like number or requests processed, SQL queries, time spent in db or view, time spent rendering a template or calling a backend system). Prometheus Flexible text-based protocol (alternatively protobuf) with HTTP REST-like communication. It was designed to be pull-based, meaning that an agent makes HTTP calls to web application which holds all metrics until they are flushed. It was build for Prometheus monitoring framework (Apache licenced) created by SoundCloud initially. Server and most agents are written in Go, can run without external database or export into 3rd party storage backends. It looks great, but it has a major problem - the Ruby client library (called client_ruby) does not support multi-process web servers at all. There are some hacks but these are using local temp files or shared memory with rather bad benchmark results (see the links down below). There is a possibility to push metrics into a separate component called PushGateway, but this was created for things like cron jobs or rake tasks. Doing multiple HTTP requests for each metric per single app request will unlikely perform well. In the README authors have note that this should be considered as "temporary solution". Although Prometheus seems to have vibrant community, the Ruby library development pace slowed down as SoundCloud "does not use many Ruby apps anymore". But it is still a good option to have. https://prometheus.io https://prometheus.io/docs/instrumenting/pushing/ https://github.com/prometheus/client_ruby https://github.com/prometheus/client_ruby/issues/9 https://github.com/prometheus/client_ruby/commits/multiprocess OpenTSDB OpenTSDB consists of a Time Series Daemon (TSD) as well as set of command line utilities. Interaction with OpenTSDB is primarily achieved by running one or more of the TSDs. Each TSD is independent. There is no master, no shared state so you can run as many TSDs as required to handle any load you throw at it. Each TSD uses the open source database Hadoop/HBase or hosted Google Bigtable service to store and retrieve time-series data. It uses push mechanism via REST JSON API with alternative "telnet-like" text endpoint. Although it does have some agents, it is more used as a storage backend than end-to-end monitoring solution. http://opentsdb.net/overview.html Statsd Main idea behind this instrumentation protocol is simple - get the measurement out of the application as fast as possible using UDP datagram. A collector agent usually runs locally, it does aggregation and relays the measurements to target backend system. The vanilla version does not support tagging, but there are extensions or mappings possible to support that. Almost all monitoring platforms has some kind of agent/importer/exporter that talks via statsd. The original statsd daemon was written in Perl years ago, then it was re-popularized by node.js implementation, but there are many alternative agents from which the most promising is statsite with very easy extensibility. This protocol is my favourite because it plays well with multiprocess Ruby servers or other Foreman components (all can just send UDP packets to localhost) and it also takes all aggregation and storing temporary data out of Ruby application. It also brings chances of regressions in our codebase to bare minimum - in the worst case the aggregating agent can fail but UDP packets will simply get lost without interrupting the application. The best Ruby client library seems to be statsd-instrument actively maintained by Shopify, it is very small without any runtime dependency. https://github.com/etsy/statsd/blob/master/docs/metric_types.md https://github.com/Shopify/statsd-instrument https://github.com/prometheus/statsd_exporter https://github.com/statsite/statsite https://codeascraft.com/2011/02/15/measure-anything-measure-everything/ New Relic, Instrumental, DataDog, Rollbar All are paid services, some clients are open-source (Instrumental is MIT licenced) but usually with not well documented protocol and worse integration to different monitoring solutions. There are plenty of similar offerings, I might have missed some here. https://newrelic.com https://instrumentalapp.com https://instrumentalapp.com/docs/tcp-collector Zabbix, Nagios, Icinga These are more of "alerting" systems (system or service is down) and they all support application instrumentation to some degree, but it is not the core of what they do. I have seen them referred as "legacy monitoring systems", but I think they are still very relevant. They are not good fit for my use case tho at all. Conclusion To me it looks like the most open and flexible protocol seems to be statsd. This will give our users the largest flexibility for further integration - there are plenty of generic agents which can relay data to backend systems. Comments? -- Later, Lukas @lzap Zapletal -- You received this message because you are subscribed to the Google Groups "foreman-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to foreman-dev+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.