Re: [foreman-dev] Foreman instrumenting analysis

Eric D Helms Wed, 01 Nov 2017 06:34:14 -0700

On Wed, Nov 1, 2017 at 3:51 AM, Lukas Zapletal <l...@redhat.com> wrote:


> Statsd can be configured for remote transport, meaning that the
> collecting agent (or aggregating process if you like) can run on
> remote server (or container). It is recommended to run it either on
> localhost or at least LAN, it is not a good idea to route the UDP
> packets through complex environments tho as they can get lost. Also
> creating a SPOF is not a good idea, but I've seen articles or comments
> about having one central statsd collector for all hosts. Those people
> had usually questions around scaleability because single point of
> entry was getting overloaded.
>
> There are some WIP patches for Prometheus as well giving a possibility
> to have single HTTP REST endpoint for all subprocesses of a Rails app
> server, but if you take a look into this (links are in my original
> email) these are pretty hacky. One is creating a local shared memory
> block for communication, the other is doing the same via serialized db
> file. This is doing dozens of system calls per single measurement,
> compared to just one or two for UDP datagram this is way too much
> IMHO.
>

Does Prometheus only not work in a multi-process Rails web server? Does it
work for a single process multi-threaded web server? This is an interesting
roadblock given you'd expect this to affect lots of webserver across
multiple languages out there.


>
> The question tho is if there is another protocol I am not aware of.
> There are actually two which I both tested to be honest:
>
> 1) PCP trace API - http://pcp.io/man/man3/pmdatrace.3.html
>
> PCP is a monitoring collecting daemon which is in most Linux distros
> (and in RHEL as well) and it has a very simple API which uses TCP
> connection for communication with trace agent (called PMDA trace). I
> wrote a Ruby wrapper around this simple API
> (https://github.com/lzap/ruby-pcptrace) and I have a working
> prototype. Disadvantage is that in PCP world this API is seen as
> legacy, might get removed in the future. Also aggregation is only done
> for transaction type observation.
>
> 1) PCP MMV API - http://pcp.io/books/PCP_PG/html/id5213288nat.html
>
> Another agent which uses memory mapped files for ultra-fast
> communication. This is the fastest possible application
> instrumentation I've seen, but it is a little bit of an overkill
> primarily targeted to HPC environment. Also no aggregation is done and
> there is no Ruby bindings at all. In both cases, a PCP daemon needs to
> be running.
>
> One question tho - isn't standard practice to have one container per
> pod that will serve as monitoring endpoint? I am no expert with
> Kubernetes, but I believe that's exactly what this technology is built
> for - you can specify services and their dependency on each other. The
> price we need to pay (an extra service) is balanced with better
> reliability - I can imagine when Rails/Passenger stops responding you
> won't be able to reach the monitoring endpoint as well thus we'd need
> to maintain a separate web stack for that.
>

Yes, standard practice is to think about one container per pod (in a
Kubernetes environment). However, there are patterns for things like log
aggregation and monitoring such as doing a sidecar container that ensures
co-location. The part I don't entirely get with sidecars is if I scale the
pod to say 5, I get 5 web applications and 5 monitoring containers and that
seems odd. Which I why I think the tendency is towards models where your
single process/application is the end point for your metrics to be scrapped
by an outside agent or services.

I agree you want the collector to be separate, but if your web application
is down what value would a monitoring endpoint being alive provide? The
application would be down, thus no metrics to serve up. The other exporters
such as the one exporting metrics about the underlying system would be
responsible for giving system metrics. In the Kube world, this is handled
by readiness and liveness probes for Kubenernetes to re-spin the container
if it stops responding.


>
> --
> Later,
>   Lukas @lzap Zapletal
>
> --
> You received this message because you are subscribed to the Google Groups
> "foreman-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to foreman-dev+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Eric D. Helms
Red Hat Engineering

-- 
You received this message because you are subscribed to the Google Groups 
"foreman-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to foreman-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [foreman-dev] Foreman instrumenting analysis

Reply via email to