Re: [DISCUSS] Entity Profiler

Nick Allen Thu, 11 Aug 2016 07:30:46 -0700

Based on this design, I submitted PR 208 [1].  Sometimes it is easier to
judge something when you have real code to see (rather than my mangled
grammar in a design doc.)  If anyone has feedback, please let me know.

There was one change to the original design.  The original topology design
looks something like the following.

[Kafka Topic] -> KafkaSpout -> ProfileSplitter -> ProfileBuilder -> [HBase
Table]

There is a separate instance of a ProfileBuilder for each Profile-Entity
pair.  For example, if I have a single profile called Profile1 and based on
the data coming in there are 2 entities/hosts associated with the profile,
then there would be two different ProfileBuilder instances.  The first for
Profile1-Host1 and the second for Profile1-Host2.

Based on the flush interval (but roughly every 15 minutes) each of these
instances would flush their data to HBase.  This provides little to no
opportunity to optimize the writes to HBase as each is writing on its own
schedule.  This can become really problematic when the number of profiles
increases or the number of entities for any single profile increases.  (The
number of entities increasing being a little more scary of a problem
because that could happen just based on changes in the data being received,
not necessarily any action that you knowingly took to add profiles.)

The design change involved adding a separate bolt responsible for writing
to HBase.  Having a separate bolt in the topology allows the writes to be
aggregated and optimized.  For example, I can batch the writes from
multiple ProfileBuilder bolts and write in a single batch.

[Kafka Topic] -> KafkaSpout -> ProfileSplitter -> ProfileBuilder ->
HBaseBolt -> [HBase Table]

I think this is a common pattern that occurs in many uses cases.  At least
it occurs frequently enough for Storm to create some dedicated code to
handle this; storm-hbase [2].

Unfortunately, I was not able to use the storm-hbase code because of the
versions of Storm and HBase that we use with Metron. The version of Storm
we use only supports HBase 0.98.x and only in the very latest versions of
Storm did they bump support up to newer versions of HBase.  Trust me, I
tried to make it work.  It felt icky, but I had to roll my own.  I kept
that code isolated enough that we can swap in storm-hbase should that ever
be a possibility.

[1] https://github.com/apache/incubator-metron/pull/208

[2] https://github.com/apache/storm/tree/master/external/storm-hbase

On Fri, Aug 5, 2016 at 10:54 AM, Nick Allen <[email protected]> wrote:

> https://issues.apache.org/jira/browse/METRON-309
>
>
>
> On Fri, Aug 5, 2016 at 8:58 AM, Casey Stella <[email protected]> wrote:
>
>> I don't think the attachment came through, Nick.  Can you post the PDF on
>> the JIRA?
>>
>> On Wed, Aug 3, 2016 at 4:22 PM, Nick Allen <[email protected]> wrote:
>>
>> > I have been thinking through the implementation of something that I am
>> > calling the "Entity Profiler."  The idea/concept was passed on to me by
>> > James Sirota and I think it would be very useful as a part of Metron.
>> >
>> > I have a draft design that I would love to get feedback on.  Please see
>> > the attached PDF.  If anything is not clear, please let me know.
>> >
>> > *The Entity Profiler is a feature extraction mechanism that can capture
>> a
>> > Profile that describes any Entity on a network.  The Entity might be a
>> > server, user, subnet or application.  The Profile itself is simply a
>> time
>> > series of numeric values.  *
>> >
>> >
>> >
>> > *The Entity Profiler will enable feature extraction using sliding
>> windows
>> > over streaming telemetry data.  The Entity Profiler will enable a
>> summary
>> > statistic to be applied to raw data over a given time horizon.
>> Collecting
>> > these values across many time horizons results in a time series that is
>> > useful for analysis.*
>> >
>> >
>> >
>> > Hopefully that is enough of a tease to gain your interest.
>> >
>> > Thanks
>> >
>> >
>> >
>> > --
>> > Nick Allen <[email protected]>
>> >
>>
>
>
>
> --
> Nick Allen <[email protected]>
>

-- 
Nick Allen <[email protected]>

Re: [DISCUSS] Entity Profiler

Reply via email to