On Fri, 28 Nov 2003 07:45:33 GMT Tomasz Klim <[EMAIL PROTECTED]> wrote:
>
> > > MY idea is not using an existing Aho-Corasick engine, but create
> >
> > Sorry but this is a completely missed idea. Signature matching is
> > the most realiable virus scanning method and other methods may only
> > be used optionally.
>
> I didn't say: leave the Aho-Corasick engine. I said about TWO
> engines: one for all files, and the second for html files.
Sorry - my fault - I read between lines.
>
> > > second one, based on fuzzy-logic (www.google.com: ATree,
> > > William Ward Armstrong, Dendronic Decisions Limited). I suggest
> > > to use some of his ideas to implement antiviral html parser.
> >
> > I don't know ATree but (as a mathematician) I think it will be very
> > hard(and almost unrealistic) to train it to detect suspected
> > HTML/email data.
>
> May it be. But result would be worth of.
Well, I still think there will be no good result of that model. Could
you please at least draft some idea how to train a network to detect
anomalies in HTML files ? As I said some time ago, HTML is my weak
point. However I have a good knowledge in stochastic algorithms and if
you can tell me how to extract 'features' (important to us) from HTML I
may be able to implement it.
> > > Second: in my opinion, using libpcre in not a very wise idea:
> >
> > I agree, it will be too expensive (and slow) to keep a seperate
> > automaton for every regular expression.
>
> I think, that you don't really need full regular expression support.
We really need it.
> I think, that could be implemented simpler and faster, than libpcre.
> It would be just harder for you. Nothing else. But results will be
> nicer.
Heh. I disagree - stochastic algorithms are:
* very hard to implement (require a deep mathematical knowledge)
* slow (require feature extraction and training)
and will of course produce a big number of false positive alerts.
> > > 1. it's slow, at least slower, that posix regex
> >
> > Please don't compare libpcre with regex !
>
> Why? It's true. Look at 'ngrep' sniffer. It has optional
They implement different models of non-deterministic finite automata and
are not fully compatible (also libpcre is much more powerful).
> > > 2. it's unstable/insecure, just like the whole exim mta
> > > (well, it's not a piece of shit, like some other solutions
> > > I've seen, but on the other hand, it's not so great)
> >
> > Please prove it ;-) : do you know some RE and text that will cause
> > libpcre to crash ?
>
> As a commercial company, we will NOT provide any patches for
> anyone. This is my general rule. I can make an exception for you
> (we both know, why), but...
>
> Second, libpcre is complicated, and even saying nothing of rules,
> we just don't have time to investigate, where exactly are the bugs.
>
> Third. Hmm, let's say that I have an working example. Do you really
> think that I will EVER send it to anyone? Please...
Without a proof your statement: "it's unstable/insecure" is only a
slander. Remember that people work hard and spend their free time to
make that GREAT software available to us.
> > > 3. using regular expressions itself is a bad idea IMHO,
> > > search for File::Scan Perl module...
> >
> > No, it isn't. We need regular expressions to detect polymorphic
> > viruses.
>
> It is. You can implement simpler solution on your own. See above.
No, it isn't ;-)
Best regards,
Tomasz Kojm
--
oo ..... [EMAIL PROTECTED] www.ClamAV.net
(\/)\......... http://www.clamav.net/gpg/tkojm.gpg
\..........._ 0DCA5A08407D5288279DB43454822DC8985A444B
//\ /\ Fri Nov 28 14:29:16 CET 2003
pgp00000.pgp
Description: PGP signature
