On 2009-01-26 20:45, Tom Shaw wrote: > I have run into some problems creating rules. I > am trying to create phish rules as > > R[Filter]:RealURL:DisplayedURL[:FuncLevelSpec] > or > MalwareName:TargetType:Offset:HexSignature[:MinEngineFunctionalityLevel:[Max]] > > and I am having two problems. > > First problem has to do with UTF/UNICODE > characters as well as various codepages which are > used in place of ascii in spam and phish. What > makes this more difficult is that one email might > contain ascii, another UTF, and yet another > Latin-2 all representing the same signature. So > how does one create a regex for the "R" rules > and/or a HEX sequence that can deal with various > character sets? >
The html-normalizer takes care of this, unicode character outside the 127 ascii range get converted into &<character-code>;. The easiest way to know what the phishing code is looking at is to run clamscan --debug and grep for 'Phish', it will show the exact,normalized URLs it is looking at. However for .pdb signatures, we found type 'H' to be sufficient, where you simply list the domain. In fact the official signatures never used 'R' type. You'll only need the 'R' type if you want to match the domain that hosts the phishing site. Is that what you want? Regular expression are useful for the whitelist (wdb format), where they are type 'X' signatures. > My second source of confusion is with target type. The options are > > * 0 = any Þle > * 1 = Portable Executable > * 2 = OLE2 component (e.g. a VBA script) > * 3 = HTML (normalised) > * 4 = Mail file > * 5 = Graphics > * 6 = ELF > * 7 = ASCII text Þle (normalised) > These are types for .ndb signatures, and are not needed/valid for .pdb/.wdb signatures. Best regards, --Edwin _______________________________________________ Help us build a comprehensive ClamAV guide: visit http://wiki.clamav.net http://www.clamav.net/support/ml
