Hi folks, As people are starting to join this mailing list (and others
can follow the discussion at
https://lists.apache.org/list.html?dev@warble.apache.org ), I thought
I'd kick off a discussion and sort of discovery of the idea behind, the
goal, and design of Apache Warble. I've been doing some back-and-forth
with a couple of people on the project (mostly ChrisT and Pono) about
what we envision Warble would be and do, and now I'd like to bring this
to the mailing list for further discussion and finally a consensus
around things.

The principal thesis in this work is "write the docs, then write the
code". We'll start by defining and formalizing what we aim to achieve
and how we want this to come about, and THEN we write the code to get
there. Writing the code first, and then maaaybe writing the
documentation later is tempting as heck, but generally a terrible idea,
so we're going to be a bit strict about this. If you don't have
documentation and tests ready, your code doesn't go into master!

GENERAL IDEA AND GOAL:
(This is the general description of what Apache
Warble will ultimately do, and how it will be achieved):

    Apache Warble is a turnkey monitoring solution with a modular,
    threaded design, allowing for both public telemetry (data pull) from
    easily deployable nodes around the world, as well as internal
    monitoring (data push) on machines, with a centralized master server
    for both managing, collecting, aggregating, alerting on and
    visualizing the data.

    The monitoring nodes (public telemetry) and/or agents (internal
    monitoring) consist of a modular design where one can quickly write
    and add new custom tests. The Master, node and agents are built with
    Python 3.

DESIGN:
Apache Warble will consist of three main components, of which
the master component is further split into three sub-components:

  - Warble Master Server:
    - Master API Service (UI + Node management)
    - Database
    - Alerting daemon
  - Warble Nodes (for public-/intra-telemetry)
  - Warble Agents (for internal machine/VM monitoring)

The general idea is visualized at: https://i.imgur.com/4DZqcWy.png

A *node* can be a VM or a container, whose only purpose is to:
    - register with the master as a node
    - receive test targets and parameters (which host to test, which
      tests to perform)
    - run the tests
    - send test reports back to the master

An *agent* is a program that can be installed on any existing
machine/VM/container, and monitors internal parameters on the host, such
as disk usage, iops, network traffic, cpou usage, memory and so on. What
to do is controlled from the master in the same way as with a node, they
key difference being that whereas a node monitors _other_ hosts, an
agent monitors only the host it is deployed on.

The *master* server is where all the orchestration happens. Ideally
we'll create this with a multi-master option for fallbacks during
possible outages. Anyway, this box registers nodes and agents, and will
have a UI for both setting up tests, viewing reports, and alerting when
something happens. I envision this being a WSGI server with an
OpenAPI-conforming JSON API, wrapped by a HTML/JavaScript UI
- we can likely just grab a stripped-down version of Kibble and reuse
  that.

The *alerter daemon* is, while still a part of the master server, a
separate process, due to the fact that this is a continuous (proactive)
process, whereas the general master server is reactive.

As for the database, I'm leaning towards using ElasticSearch for the
permanent storage, and possibly Redis for the ephemeral lookup cache for
the alerter.


So, this is the general stuff so far. Feedback, comments, ideas,
whatever, is very much welcome at this stage, and once we have some
consensus on this, we can start hammering out some documentation and
subsequently some code. We are going to get tests and what not
pseudo-donated from Quenda (pushed as new code instead of a grant, as
the original source cannot be granted as is), which will speed up the
coding phase by a fair bit. Once we agree on the general principles,
we can start discussing each component separately, and decide on
approaches for each of them.

With regards,
Daniel.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@warble.apache.org
For additional commands, e-mail: dev-h...@warble.apache.org

Reply via email to