Wow, Jesse -- someone actually reads the stuff I write. Thank you! ;-) You are absolutely right -- the entropy is a simple scalar number 0-1 that describes the amount of variety in a signal. In the case of cfengine the variety comes from the sources the "signal" is made up of. Zero entropy (low) means very focused (little variation) and high means very varied.
For netstat data, it is the variation of different IP addresses measured in the last 5 mins. So you would expect low entropy on a little used machine. For processes it could be the number of different process commands, etc. I implemented this back in 1998 to see if one could learn something from these data. For instance, a host that normally has a pretty high entropy www_in signal (lots of different users is normal) suddenly gets a very focused signal from one IP (low entropy) -- this could be an attack. It is certainly unusual. LDT stands for Leap Detection Test, and is a way of detecting when suddent changes occur in signals. It is a different kind of anomaly. It is also of limited value because the research in that area was never carried to a conclusion. One day¸however, my plan is to revive these things in the commercial cfengines to add the necessary intelligence to make them useful, through the GUI Mission Portal. What Cfengine does with monitoring is really different to any other software. Sometimes people say -- we don't want that monitoring stuff because we like RRD tool better, or something like that. But these offer quite different insights. I think we have a job to do making it easier to understand the value of the results. A lot of this is visual, so it belongs in a GUI. After 10 years of researching this, I think the usefulness of the anomaly detection classes is limited, because it is so context sensitive to many factors. It needs a human interpretation to make sense of it, so really these numbers are for the intelligent sysadmin to watch with interest. On 05/10/2011 02:26 PM, Jesse Becker wrote: > On Tue, May 10, 2011 at 02:08:22AM -0400, Jerome Baum wrote: >> On Tue, May 10, 2011 at 07:51, Aleksey >> Tsalolikhin<atsaloli.t...@gmail.com<mailto:atsaloli.t...@gmail.com>> wrote: >> What is entropy here and how it is computed? Are both low and high >> entropy "bad"? Or is low entropy good, high entropy bad? > Generally speaking, entropy is a measurement of "disorder" or > "variation." There are specific, formal definitions, but I think that > these two are sufficient for now. > >> Low entropy is bad (not "bad" but bad, for security reasons). Entropy is >> basically how much "randomness" is available, which is very important for >> cryptographic systems -- such as SSL, SSH, and security in cfengine. > Right. Entropy is typically used to make your RNG much more random. :) > It is possible to "run out" of entropy as well. An example of this, on > Linux systems is to compare the behavior of "od /dev/random" and > "od /dev/urandom". The output from /dev/random will pause when you run > out of entropy, whereas output from /dev/urandom has no such limitation. > The data from /dev/random is consider much higher quality with regards > to randomness. > >> You tend to get low entropy on server systems w/out keyboard and mouse to >> take entropy from. For further reading >> http://en.wikipedia.org/wiki/Entropy_(computing) helps. > Yep, and various other sources as well (audio input, video, etc). > > Back to cfengine... > > The entropy and anomaly classes come from cf-monitord (so if you turn it > off, you won't get those classes). The cf-monitord process will try to track > various metrics, and provide those to cf-agent. It can actually watch > the traffic flows, and categorize traffic by port number, but this > requires, essentially, letting cf-monitor "sniff" all traffic--which > might not be acceptable in your environment. > > Metrics other than network traffic can also be checked. Your other > email mentions "loadavg_high_ldt", which means that cf-monitord thinks > that, at that time, the load average was higher than usual based on the > "Leap-Detection Test" (hence "ldt"). You may also see entries like > "messages_high_dev1", which indicate that that the current value of the > metric is more than 1 standard deviation above the average. > > This paper also talks about it in detail for CF2: > > http://www.iu.hio.no/cfengine/docs/cfengine-Anomalies.pdf > > And this one goes into the mathematics behind it: > > http://www.iu.hio.no/~mark/papers/anomaly.pdf > > One of the better explanations of how anomaly detection works is > actually in the SAGE short-topics booklet that Mark Burgess and Aeleen > Frisch wrote a few years back. It uses CF2 syntax, but I believe that > the general concepts are still valid. Unfortunately, it doesn't cover > the LDT stuff I mentioned before. > > http://www.sage.org/pubs/16_cfengine/ > > Unfortunately, I've been unable to find a paper that discusses anomaly > detection for CF3 in detail. > _______________________________________________ Help-cfengine mailing list Help-cfengine@cfengine.org https://cfengine.org/mailman/listinfo/help-cfengine