--- Mark Waser <[EMAIL PROTECTED]> wrote:
A nice story but it proves absolutely nothing . . . . .
I know a little about network intrusion anomaly detection (it was my
dissertation topic), and yes it is an important lessson.
Network traffic containing attacks has a higher algorithmic complexity
than
traffic without attacks. It is less compressible. The reason has nothing
to
do with the attacks, but with arbitrary variations in protocol usage made
by
the attacker. For example, the Code Red worm fragments the TCP stream
after
the HTTP "GET" command, making it detectable even before the buffer
overflow
code is sent in the next packet. A statistical model will learn that this
is
unusual (even though legal) in normal HTTP traffic, but offer no
explanation
why such an event should be hostile. The reason such anomalies occur is
because when attackers craft exploits, they follow enough of the protocol
to
make it work but often don't care about the undocumented conventions
followed
by normal servers and clients. For example, they may use lower case
commands
where most software uses upper case, or they may put unusual but legal
values
in the TCP or IP-ID fields or a hundred other things that make the attack
stand out. Even if they are careful, many exploits require unusual
commands
or combinations of options that rarely appear in normal traffic and are
therefore less carefully tested.
So my point is that it is pointless to try to make an anomaly detection
system
explain its reasoning, because the only explanation is that the traffic is
unusual. The best you can do is have it estimate the probability of a
false
alarm based on the information content.
So the lesson is that AGI is not the only intelligent system where you
should
not waste your time trying to understand what it has learned. Even if you
understood it, it would not tell you anything. Would you understand why a
person made some decision if you knew the complete state of every neuron
and
synapse in his brain?
You developed a pattern-matcher. The pattern matcher worked (and I would
dispute that it worked better "than it had a right to"). Clearly, you do
not understand how it worked. So what does that prove?
Your contention (or, at least, the only one that continues the previous
thread) seems to be that you are too stupid to ever understand the
pattern
that it found.
Let me offer you several alternatives:
1) You missed something obvious
2) You would have understood it if the system could have explained it to
you
3) You would have understood it if the system had managed to losslessly
convert it into a more compact (and comprehensible) format
4) You would have understood it if the system had managed to losslessly
convert it into a more compact (and comprehensible) format and explained
it
to your
5) You would have understood it if the system had managed to lossily
convert it into a more compact (and comprehensible -- and probably even,
more correct) format
6) You would have understood it if the system had managed to lossily
convert it into a more compact (and comprehensible -- and probably even,
more correct) format and explained it to you
My contention is that the pattern that it found was simply not translated
into terms you could understand and/or explained.
Further, and more importantly, the pattern matcher *doesn't* understand
it's
results either and certainly could build upon them -- thus, it *fails*
the
test as far as being the central component of an RSIAI or being able to
provide evidence as to the required behavior of such.
----- Original Message -----
From: "Philip Goetz" <[EMAIL PROTECTED]>
To: <agi@v2.listbox.com>
Sent: Friday, December 01, 2006 7:02 PM
Subject: Re: [agi] A question on the symbol-system hypothesis
> On 11/30/06, Mark Waser <[EMAIL PROTECTED]> wrote:
>> With many SVD systems, however, the representation is more
>> vector-like
>> and *not* conducive to easy translation to human terms. I have two
>> answers
>> to these cases. Answer 1 is that it is still easy for a human to look
>> at
>> the closest matches to a particular word pair and figure out what they
>> have
>> in common.
>
> I developed an intrusion-detection system for detecting brand new
> attacks on computer systems. It takes TCP connections, and produces
> 100-500 statistics on each connection. It takes thousands of
> connections, and runs these statistics thru PCA to come up with 5
> dimensions. Then it clusters each connection, and comes up with 1-3
> clusters per port that have a lot of connections and are declared to
> be "normal" traffic. Those connections that lie far from any of those
> clusters are identified as possible intrusions.
>
> The system worked much better than I expected it to, or than it had a
> right to. I went back and, by hand, tried to figure out how it was
> classifying attacks. In most cases, my conclusion was that there was
> *no information available* to tell whether a connection was an attack,
> because the only information to tell that a connection was an attack
> was in the TCP packet contents, while my system looked only at packet
> headers. And yet, the system succeeded in placing about 50% of all
> attacks in the top 1% of suspicious connections. To this day, I don't
> know how it did it.
>
> -----
> This list is sponsored by AGIRI: http://www.agiri.org/email
> To unsubscribe or change your options, please go to:
> http://v2.listbox.com/member/?list_id=303
>
-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303
-- Matt Mahoney, [EMAIL PROTECTED]
-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303