Prior to adoption of Metron each adopting entity needs to guesstimate it’s data 
volume and data storage requirements so they can size their cluster properly.  
I propose a creation of an assessment tool that can plug in to a Kafka topic 
for a given telemetry and over time produce statistics for ingest volumes and 
storage requirement.  The idea is that prior to adoption of Metron someone can 
set up all the feeds and kafka topics, but instead of deploying Metron right 
away they would deploy this tool.  This tool would then produce statistics for 
data ingest/storage requirement, and all relevant information needed for 
cluster sizing.

Some of the metrics that can be recorded are:

  *   Number of system events per second (average, max, mean, standard dev)
  *   Message size  (average, max, mean, standard dev)
  *   Average number of peaks
  *   Duration of peaks  (average, max, mean, standard dev)

If the parser for a telemetry exist the tool can produce additional statistics

  *   Number of keys/fields parsed (average, max, mean, standard dev)
  *   Length of field parsed (average, max, mean, standard dev)
  *   Length of key parsed (average, max, mean, standard dev)

The tool can run for a week or a month and produce these kinds of statistics.  
Then once the statistics are available we can come up with a guidance 
documentation of recommended cluster setup.  Otherwise it’s hard to properly 
size a cluster and setup streaming parallelism not knowing these metrics.


Thoughts/ideas?

Thanks,
James

Reply via email to