Wow, Jesse -- someone actually reads the stuff I write. Thank you! ;-)

You are absolutely right -- the entropy is a simple scalar number 0-1 that
describes the amount of variety in a signal. In the case of cfengine the 
variety comes from
the sources the "signal" is made up of.  Zero entropy (low) means very 
focused (little variation) and high means very varied.

For netstat data, it is the variation of different IP addresses measured 
in the last 5 mins. So you would expect low entropy on a little used 
machine. For processes it could be the number of different process 
commands, etc. I implemented this back in 1998 to see if one could learn 
something from these data.

For instance, a host that normally has a pretty high entropy www_in 
signal (lots of different users is normal)
suddenly gets a very focused signal from one IP (low entropy) -- this 
could be an attack. It is certainly unusual.

LDT stands for Leap Detection Test, and is a way of detecting when 
suddent changes occur in signals. It is a different
kind of anomaly. It is also of limited value because the research in 
that area was never carried to a conclusion. One day¸however, my plan is 
to revive these things in the commercial cfengines to add the necessary 
intelligence to make them useful, through the GUI Mission Portal.

What Cfengine does with monitoring is really different to any other 
software. Sometimes people say -- we don't want that monitoring stuff 
because we like RRD tool better, or something like that. But these offer 
quite different insights. I think we have a job to do making it easier 
to understand the value of the results. A lot of this is visual, so it 
belongs in a GUI.

After 10 years of researching this, I think the usefulness of the 
anomaly detection classes is limited, because it is so context sensitive 
to many factors. It needs a human interpretation to make sense of it, so 
really these numbers are for the intelligent sysadmin to watch with 
interest.


On 05/10/2011 02:26 PM, Jesse Becker wrote:
> On Tue, May 10, 2011 at 02:08:22AM -0400, Jerome Baum wrote:
>> On Tue, May 10, 2011 at 07:51, Aleksey 
>> Tsalolikhin<atsaloli.t...@gmail.com<mailto:atsaloli.t...@gmail.com>>  wrote:
>> What is entropy here and how it is computed?  Are both low and high
>> entropy "bad"?  Or is low entropy good, high entropy bad?
> Generally speaking, entropy is a measurement of "disorder" or
> "variation."  There are specific, formal definitions, but I think that
> these two are sufficient for now.
>
>> Low entropy is bad (not "bad" but bad, for security reasons). Entropy is 
>> basically how much "randomness" is available, which is very important for 
>> cryptographic systems -- such as SSL, SSH, and security in cfengine.
> Right.  Entropy is typically used to make your RNG much more random. :)
> It is possible to "run out" of entropy as well.  An example of this, on
> Linux systems is to compare the behavior of "od /dev/random" and
> "od /dev/urandom".  The output from /dev/random will pause when you run
> out of entropy, whereas output from /dev/urandom has no such limitation.
> The data from /dev/random is consider much higher quality with regards
> to randomness.
>
>> You tend to get low entropy on server systems w/out keyboard and mouse to 
>> take entropy from. For further reading 
>> http://en.wikipedia.org/wiki/Entropy_(computing) helps.
> Yep, and various other sources as well (audio input, video, etc).
>
> Back to cfengine...
>
> The entropy and anomaly classes come from cf-monitord (so if you turn it
> off, you won't get those classes).  The cf-monitord process will try to track
> various metrics, and provide those to cf-agent.  It can actually watch
> the traffic flows, and categorize traffic by port number, but this
> requires, essentially, letting cf-monitor "sniff" all traffic--which
> might not be acceptable in your environment.
>
> Metrics other than network traffic can also be checked.  Your other
> email mentions "loadavg_high_ldt", which means that cf-monitord thinks
> that, at that time, the load average was higher than usual based on the
> "Leap-Detection Test" (hence "ldt").  You may also see entries like
> "messages_high_dev1", which indicate that that the current value of the
> metric is more than 1 standard deviation above the average.
>
> This paper also talks about it in detail for CF2:
>
>       http://www.iu.hio.no/cfengine/docs/cfengine-Anomalies.pdf
>
> And this one goes into the mathematics behind it:
>
>       http://www.iu.hio.no/~mark/papers/anomaly.pdf
>
> One of the better explanations of how anomaly detection works is
> actually in the SAGE short-topics booklet that Mark Burgess and Aeleen
> Frisch wrote a few years back.  It uses CF2 syntax, but I believe that
> the general concepts are still valid.  Unfortunately, it doesn't cover
> the LDT stuff I mentioned before.
>
>       http://www.sage.org/pubs/16_cfengine/
>
> Unfortunately, I've been unable to find a paper that discusses anomaly
> detection for CF3 in detail.
>

_______________________________________________
Help-cfengine mailing list
Help-cfengine@cfengine.org
https://cfengine.org/mailman/listinfo/help-cfengine

Reply via email to