> The initial step could be something simple
> like request processing metrics and exposing the numbers via JMX
>
> 1. What metrics are we interested in?
> 2. Who are the potential consumers of this data? Dashboards?
> 3. How do we want to expose the metrics?
> 4. Do we want to capture metrics at a service level (e.g. All requests
> made for WebHDFS)?
>

I would appreciate some JMX 'counters'.  Consumers of other services ingest
JMX (NameNode, ResourceManager and NodeManager) into DBs/Dashboards like
Elastic Search/Kibana or Solr/Banana now so JMX from Knox can slide into
those existing patterns and avoid having to Flume the audit log, although,
the audit log provides more information around TPS per user and the end
points they are hitting.

I consider Knox to be a scale up and scale down service.  If the TPS can be
associated with Knox load then decisions can be made to spin up new Knox
VMs, behind a balancer, to meet that demand in a dynamic fashion.

As with TPS, byte transfer counts per service, and aggregate, would provide
better facts for a dynamic scale decision process.  The particular way I
use Knox now includes other services that would blur identifying packet
transfers that Knox is conducting vice the other services that are
co-located.

Per topology and service metrics would be best.

Other metrics - as counts:
-Unsuccessful login
-Successful login but overall return was HTTP 500 which indicates failure
on the cluster side.  An example would be users connecting to Knox with
valid AD user/pass but which were not authorized in the cluster.  This can
happen when the cluster is in secure mode but a service like Centrify has
not allowed the user into the cluster's zone.
-Unsuccessful AD lookup by Knox - user doesn't exist.
-Connection counts that used and didn't use an auth cookie and resulted in
an AD lookup
-Current open connections

*These metrics wouldn't provide actionable intelligence but build a pattern
something is wrong and the administrators should investigate.

Capability to reset the aggregate counters while Knox is running.

Kris

Reply via email to