Re: [clamav-users] Win.Trojan.URLspoof-2 signtuare and WARC files

Al Varnell Mon, 19 Dec 2016 20:25:23 -0800

One correction to the Group 2 signature, it's just '%00@'.

The only available method for having a signature removed or modified is by 
submitting one or more False Positives at 
<http://www.clamav.net/reports/fp> and include the details you have covered 
below.  If you would like to be notified of changes in the virus database, you 
will need to join the clamav-virusdb mailing-list 
<http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-virusdb>.


You can submit any suggested revised signature through the ClamAV Community 
Signatures program 
<http://blog.clamav.net/2014/02/introducing-clamav-community-signatures.html>.

Although I'm not a signature expert by any means, but I would have to agree 
that both the art and ClamAV engine capabilities have improved since this one 
was apparently written and it should be easily improved.

-Al-


On Mon, Dec 19, 2016 at 05:40 PM, Jay Gattuso wrote:
> 
> Win.Trojan.URLspoof-2
> We’re encountering some issues with this particular “virus”, and having 
> worked through what we’re seeing, I wanted to ask a couple of questions..
> The signature is pretty weak.
> 
> [main.ndb] Win.Trojan.URLspoof-2:0:*:20687265663d22*0125303040*223e*3c2f
> 
> 
> We’ve seen hits against this signature 14 times in 8 years (I’m not sure how 
> long it’s been in the defs, but we’ve been checking our ~20Mil files against 
> ClamAV for 8 years).
> Every hit for Win.Trojan.URLspoof-2 we’ve seen is a false positive.
> Breaking the signature sequence into parts reveals the weakness of this 
> particular signature:
> 
> Group 1:  20687265663d22 = ’ href=’
> Group 2:  0125303040 = ‘\x01%00@’
> Group 3: 223e = ‘">’
> Group 4: 3c2f = ‘</’
> 
> This false positives is appearing in WARC files 
> (http://iipc.github.io/warc-specifications/), and its earlier variant ARC 
> (http://archive.org/web/researcher/ArcFileFormat.php)
> I’ve been pulling these containers apart, and can see that we only get a hit 
> when the signature parts are found across the content container, so for us,  
> group 1 appearing in any piece of HTML, group 2 appearing in a variety of 
> file formats including PDF, MP3, MP4 and JPG. Groups 3 and 4 are trivial and 
> appear everywhere. The point here, is that it is never caused by a single 
> file as would found in the wild, only through the aggregation we undertake 
> ourselves when creating these WARC files.
> 
> We run a slightly non-standard conf:
> 
> # MaxScanSize
> # Default: 100M
> MaxScanSize 2048M
> 
> And
> 
> # MaxFileSize
> # Default: 25M
> MaxFileSize 2048M
> 
> Questions:
> 
> 1)      How would I go about getting this signature either removed or 
> hardened? For example, if the signature is specifically hunting for a URL, 
> perhaps it could be confined to the max URL length * 2 or some such 
> (http://stackoverflow.com/questions/417142/what-is-the-maximum-length-of-a-url-in-different-browsers)
>  say 4000 bytes. As I’ve never seen a positive hit against this signature, 
> and I have no idea how common it is or what its actually looking for. 
> Removing it might not be a great idea.
> 
> Is there any resources that might help me to work on a stronger signature for 
> this particular threat, and what’s the process for suggesting a 
> revision/removal?
> 
> 2)      These hits all happen in the W/ARC container. These containers are 
> simple serialisations of arbitrary files harvested from websites, and their 
> associated HTTP transaction. These are used to “replay” web harvests (like 
> the wayback machine etc). Is there any way we can handle these particular 
> file types differently? As these files are aggregations of any number of 
> binary items we are much more likely to encounter false positives, especially 
> for weak signatures. We’ve only seen false positives for the Trojan URL 
> signature, but I anticipate seeing more when we process the 80Tbs of WARCs we 
> have waiting to come in – these will translate into ~2billion files housed in 
> several hundred thousand WARC files.
> 
> Ideally we ought to be ripping the (W)ARC into its binary parts – by parsing 
> an arbitrary aggregation of many files as a coherent file of single payload I 
> think we’re doing ourselves a disservice. I wondered if there was a method 
> within the ClamAV architecture that would support the construction of a WARC 
> parser. This might allow WARC files to be “properly” consumed as a series of 
> disconnected binary items, reducing the likelihood of false positives.
> 
> We are also looking at what it would mean for our workflow to explode the 
> W/ARCs into their parts before they are presented for scanning, and that’s a 
> viable option. For now I’m mainly interested in knowing what we could/could 
> not do.
> 
> 
> Jay Gattuso | Digital Preservation Analyst | Preservation, Research and 
> Consultancy
> National Library of New Zealand | Te Puna Mātauranga o Aotearoa
> PO Box 1467 Wellington 6140 New Zealand | +64 (0)4 474 3064
> jay.gatt...@dia.govt.nz<mailto:jay.gatt...@natlib.govt.nz>
> 
> _______________________________________________
> clamav-users mailing list
> clamav-users@lists.clamav.net
> http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users
> 
> 
> Help us build a comprehensive ClamAV guide:
> https://github.com/vrtadmin/clamav-faq
> 
> http://www.clamav.net/contact.html#ml

-Al-
-- 
Al Varnell
Mountain View, CA

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
clamav-users mailing list
clamav-users@lists.clamav.net
http://lists.clamav.net/cgi-bin/mailman/listinfo/clamav-users


Help us build a comprehensive ClamAV guide:
https://github.com/vrtadmin/clamav-faq

http://www.clamav.net/contact.html#ml

Re: [clamav-users] Win.Trojan.URLspoof-2 signtuare and WARC files

Reply via email to