One correction to the Group 2 signature, it's just '%00@'. The only available method for having a signature removed or modified is by submitting one or more False Positives at <http://www.clamav.net/reports/fp> and include the details you have covered below. If you would like to be notified of changes in the virus database, you will need to join the clamav-virusdb mailing-list <http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-virusdb>.
You can submit any suggested revised signature through the ClamAV Community Signatures program <http://blog.clamav.net/2014/02/introducing-clamav-community-signatures.html>. Although I'm not a signature expert by any means, but I would have to agree that both the art and ClamAV engine capabilities have improved since this one was apparently written and it should be easily improved. -Al- On Mon, Dec 19, 2016 at 05:40 PM, Jay Gattuso wrote: > > Win.Trojan.URLspoof-2 > We’re encountering some issues with this particular “virus”, and having > worked through what we’re seeing, I wanted to ask a couple of questions.. > The signature is pretty weak. > > [main.ndb] Win.Trojan.URLspoof-2:0:*:20687265663d22*0125303040*223e*3c2f > > > We’ve seen hits against this signature 14 times in 8 years (I’m not sure how > long it’s been in the defs, but we’ve been checking our ~20Mil files against > ClamAV for 8 years). > Every hit for Win.Trojan.URLspoof-2 we’ve seen is a false positive. > Breaking the signature sequence into parts reveals the weakness of this > particular signature: > > Group 1: 20687265663d22 = ’ href=’ > Group 2: 0125303040 = ‘\x01%00@’ > Group 3: 223e = ‘">’ > Group 4: 3c2f = ‘</’ > > This false positives is appearing in WARC files > (http://iipc.github.io/warc-specifications/), and its earlier variant ARC > (http://archive.org/web/researcher/ArcFileFormat.php) > I’ve been pulling these containers apart, and can see that we only get a hit > when the signature parts are found across the content container, so for us, > group 1 appearing in any piece of HTML, group 2 appearing in a variety of > file formats including PDF, MP3, MP4 and JPG. Groups 3 and 4 are trivial and > appear everywhere. The point here, is that it is never caused by a single > file as would found in the wild, only through the aggregation we undertake > ourselves when creating these WARC files. > > We run a slightly non-standard conf: > > # MaxScanSize > # Default: 100M > MaxScanSize 2048M > > And > > # MaxFileSize > # Default: 25M > MaxFileSize 2048M > > Questions: > > 1) How would I go about getting this signature either removed or > hardened? For example, if the signature is specifically hunting for a URL, > perhaps it could be confined to the max URL length * 2 or some such > (http://stackoverflow.com/questions/417142/what-is-the-maximum-length-of-a-url-in-different-browsers) > say 4000 bytes. As I’ve never seen a positive hit against this signature, > and I have no idea how common it is or what its actually looking for. > Removing it might not be a great idea. > > Is there any resources that might help me to work on a stronger signature for > this particular threat, and what’s the process for suggesting a > revision/removal? > > 2) These hits all happen in the W/ARC container. These containers are > simple serialisations of arbitrary files harvested from websites, and their > associated HTTP transaction. These are used to “replay” web harvests (like > the wayback machine etc). Is there any way we can handle these particular > file types differently? As these files are aggregations of any number of > binary items we are much more likely to encounter false positives, especially > for weak signatures. We’ve only seen false positives for the Trojan URL > signature, but I anticipate seeing more when we process the 80Tbs of WARCs we > have waiting to come in – these will translate into ~2billion files housed in > several hundred thousand WARC files. > > Ideally we ought to be ripping the (W)ARC into its binary parts – by parsing > an arbitrary aggregation of many files as a coherent file of single payload I > think we’re doing ourselves a disservice. I wondered if there was a method > within the ClamAV architecture that would support the construction of a WARC > parser. This might allow WARC files to be “properly” consumed as a series of > disconnected binary items, reducing the likelihood of false positives. > > We are also looking at what it would mean for our workflow to explode the > W/ARCs into their parts before they are presented for scanning, and that’s a > viable option. For now I’m mainly interested in knowing what we could/could > not do. > > > Jay Gattuso | Digital Preservation Analyst | Preservation, Research and > Consultancy > National Library of New Zealand | Te Puna Mātauranga o Aotearoa > PO Box 1467 Wellington 6140 New Zealand | +64 (0)4 474 3064 > jay.gatt...@dia.govt.nz<mailto:jay.gatt...@natlib.govt.nz> > > _______________________________________________ > clamav-users mailing list > clamav-users@lists.clamav.net > http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users > > > Help us build a comprehensive ClamAV guide: > https://github.com/vrtadmin/clamav-faq > > http://www.clamav.net/contact.html#ml -Al- -- Al Varnell Mountain View, CA
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ clamav-users mailing list clamav-users@lists.clamav.net http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users Help us build a comprehensive ClamAV guide: https://github.com/vrtadmin/clamav-faq http://www.clamav.net/contact.html#ml