Makes sense.

-- 
George VetticadenPrincipal, Senior Product Manager for Metron
[email protected]
(630) 909-9138





On 4/13/16, 12:05 PM, "James Sirota" <[email protected]> wrote:

>Hi Goerge, 
>
>This article defines micro-tuning of the existing cluster.  What I am
>proposing is a level up from that.  When you start with Metron how do you
>even know how many nodes you need?  And of these nodes how many do you
>allocate to Storm, indexing, storage?  How much storage do you need?
>Tuning would be the next step in the process, but this tool would answer
>more fundamental questions about what a Metron deployment should look
>like given the number of telemetries and retention policies of the
>enterprise.
>
>The best way to get this data (in my opinion) is to have some tool that
>we can plug into Metron’s point of ingest (kafka topics) and run that for
>about a week or a month to be able to figure that out and spit out these
>relevant metrics.  Based on these metrics we can figure out the
>fundamental things about what metron should look like.  Tuning would be
>the next step.
>
>Thanks,
>James 
>
>
>
>
>On 4/13/16, 9:52 AM, "George Vetticaden" <[email protected]>
>wrote:
>
>>I have used the following Kafka and Storm Best Practices guide at
>>numerous
>>customer implementations.
>>https://community.hortonworks.com/articles/550/unofficial-storm-and-kafka
>>-b
>>est-practices-guide.html
>>
>>
>>We need to have something similar and prescriptive for Metron based on:
>>1. What data sources are we enabling
>>2. What enrichment services are we enabling
>>3. What threat intel services are we enabling
>>4. What are we indexing into Solr/Elastic and how long
>>5. What are we persisting into HDFS..
>>
>>Ideally, the The metron assessment tool combined with an introspection of
>>the user’s  ansible configuration should drive what ambari blueprint type
>>and configuration should be used when the cluster is spun up and the
>>storm
>>topology is deployed.
>> 
>>
>>-- 
>>George VetticadenPrincipal, COE
>>[email protected]
>>(630) 909-9138
>>
>>
>>
>>
>>
>>On 4/13/16, 11:40 AM, "George Vetticaden" <[email protected]>
>>wrote:
>>
>>>+ 1 to James suggestion.
>>>We also need to consider not just the data volume and storage
>>>requirements
>>>for proper cluster sizing but also processing requirements as well.
>>>Given
>>>that in the new architecture, we have moved to single enrichment
>>>topology
>>>that will support all data sources, proper sizing of the enrichment
>>>topology  will be even more crucial to maintain SLAs and HA
>>>requirements.
>>>The following key questions will apply to each parser topology and
>>>single
>>>enrichment topology
>>>
>>>1. Number of workers?
>>>2. Number of workers per machine?
>>>3. Size of each workers (in memory)?
>>>4. Supervisor memory settings
>>>
>>>The assessment tool should also be used to size topologies correctly as
>>>well. 
>>>
>>>Tuning Kafka, Hbase and Solr/Elastic should also be governed by the
>>>Metron
>>>assessment tool.
>>>
>>>
>>>-- 
>>>George Vetticaden
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>On 4/13/16, 11:28 AM, "James Sirota" <[email protected]> wrote:
>>>
>>>>Prior to adoption of Metron each adopting entity needs to guesstimate
>>>>it¹s data volume and data storage requirements so they can size their
>>>>cluster properly.  I propose a creation of an assessment tool that can
>>>>plug in to a Kafka topic for a given telemetry and over time produce
>>>>statistics for ingest volumes and storage requirement.  The idea is
>>>>that
>>>>prior to adoption of Metron someone can set up all the feeds and kafka
>>>>topics, but instead of deploying Metron right away they would deploy
>>>>this
>>>>tool.  This tool would then produce statistics for data ingest/storage
>>>>requirement, and all relevant information needed for cluster sizing.
>>>>
>>>>Some of the metrics that can be recorded are:
>>>>
>>>>  *   Number of system events per second (average, max, mean, standard
>>>>dev)
>>>>  *   Message size  (average, max, mean, standard dev)
>>>>  *   Average number of peaks
>>>>  *   Duration of peaks  (average, max, mean, standard dev)
>>>>
>>>>If the parser for a telemetry exist the tool can produce additional
>>>>statistics
>>>>
>>>>  *   Number of keys/fields parsed (average, max, mean, standard dev)
>>>>  *   Length of field parsed (average, max, mean, standard dev)
>>>>  *   Length of key parsed (average, max, mean, standard dev)
>>>>
>>>>The tool can run for a week or a month and produce these kinds of
>>>>statistics.  Then once the statistics are available we can come up
>>>>with a
>>>>guidance documentation of recommended cluster setup.  Otherwise it¹s
>>>>hard
>>>>to properly size a cluster and setup streaming parallelism not knowing
>>>>these metrics.
>>>>
>>>>
>>>>Thoughts/ideas?
>>>>
>>>>Thanks,
>>>>James
>>>
>>>
>>

Reply via email to