Re: [Clamav-devel] text-based viruses and their signatures

Tomasz Kojm Fri, 28 Nov 2003 06:57:07 -0800

On Fri, 28 Nov 2003 07:45:33 GMT
Tomasz Klim <[EMAIL PROTECTED]> wrote:


> 
> > > MY idea is not using an existing Aho-Corasick engine, but create
> > 
> > Sorry but this is a completely missed idea. Signature matching is
> > the most realiable virus scanning method and other methods may only
> > be used optionally.
> 
> I didn't say: leave the Aho-Corasick engine. I said about TWO
> engines: one for all files, and the second for html files.

Sorry - my fault - I read between lines.

> 
> > > second one, based on fuzzy-logic (www.google.com: ATree,
> > > William Ward Armstrong, Dendronic Decisions Limited). I suggest
> > > to use some of his ideas to implement antiviral html parser.
> > 
> > I don't know ATree but (as a mathematician) I think it will be very
> > hard(and almost unrealistic) to train it to detect suspected
> > HTML/email data.
> 
> May it be. But result would be worth of.

Well, I still think there will be no good result of that model. Could
you please at least draft some idea how to train a network to detect
anomalies in HTML files ? As I said some time ago, HTML is my weak
point. However I have a good knowledge in stochastic algorithms and if
you can tell me how to extract 'features' (important to us) from HTML I
may be able to implement it.

> > > Second: in my opinion, using libpcre in not a very wise idea:
> > 
> > I agree, it will be too expensive (and slow) to keep a seperate
> > automaton for every regular expression.
> 
> I think, that you don't really need full regular expression support.

We really need it.

> I think, that could be implemented simpler and faster, than libpcre.
> It would be just harder for you. Nothing else. But results will be
> nicer.

Heh. I disagree - stochastic algorithms are:

 * very hard to implement (require a deep mathematical knowledge)
 * slow (require feature extraction and training)

and will of course produce a big number of false positive alerts.


> > > 1. it's slow, at least slower, that posix regex
> > 
> > Please don't compare libpcre with regex !
> 
> Why? It's true. Look at 'ngrep' sniffer. It has optional

They implement different models of non-deterministic finite automata and
are not fully compatible (also libpcre is much more powerful).

> > > 2. it's unstable/insecure, just like the whole exim mta
> > >    (well, it's not a piece of shit, like some other solutions
> > >    I've seen, but on the other hand, it's not so great)
> > 
> > Please prove it ;-) : do you know some RE and text that will cause
> > libpcre to crash ? 
> 
> As a commercial company, we will NOT provide any patches for
> anyone. This is my general rule. I can make an exception for you
> (we both know, why), but...
> 
> Second, libpcre is complicated, and even saying nothing of rules,
> we just don't have time to investigate, where exactly are the bugs.
> 
> Third. Hmm, let's say that I have an working example. Do you really
> think that I will EVER send it to anyone? Please...

Without a proof your statement: "it's unstable/insecure" is only a
slander. Remember that people work hard and spend their free time to
make that GREAT software available to us.

> > > 3. using regular expressions itself is a bad idea IMHO,
> > >    search for File::Scan Perl module...
> > 
> > No, it isn't. We need regular expressions to detect polymorphic
> > viruses.
> 
> It is. You can implement simpler solution on your own. See above.

No, it isn't ;-)

Best regards,
Tomasz Kojm
-- 
      oo    .....       [EMAIL PROTECTED]         www.ClamAV.net
     (\/)\.........     http://www.clamav.net/gpg/tkojm.gpg
        \..........._   0DCA5A08407D5288279DB43454822DC8985A444B
          //\   /\      Fri Nov 28 14:29:16 CET 2003

pgp00000.pgp
Description: PGP signature

Re: [Clamav-devel] text-based viruses and their signatures

Reply via email to