On 2008-12-30 13:44, Babu.N wrote: > Hi Edwin, > > Thanks for the response. > > Please see inline.. > > > At 05:26 PM 12/29/2008, Török Edwin wrote: > >> On 2008-12-29 12:53, Babu.N wrote: >> >>> Hi, >>> >>> I am developing SHIM layer for ClamAV to support Freescale pattern >>> matching hardware. Could you please clarify a few queries: >>> >>> 1. Freescale has a pattern matching engine with 64k pattern capacity. >>> >>> >> How long can the patterns be? Does it support wildcards? >> Does it support regular expressions? >> > > Yes. > >
There has to be a limit on the size of a regular expression, or else I could upload a 2Gb regular expression into it ;) >>> But clamAV has approx 169000 signatures. This means hardware engine >>> will not be able to accomodate all the signatures. >>> >> What if you combine N patterns into a single regular expression >> (hardware limits allowing). >> If there is a match, then you use software to tell which of the N >> patterns matched. >> > > After hardware reports a match in a combined > regex, how can software distinguish which sub-regex actually matched ? > By matching with a specialized trie for the candidate sub-regexes. For example lets assume you combine patterns 1, 74, and 192 into a single regex for hardware matching. When the hardware reports a match, in software you only need to try matching with a trie containing signatures 1, 74, and 192, which should be very fast. Keep in mind that in a real situation most files you scan are clean, and you should get matches only for when the file is infected. Of course there are also the on-the-fly filetype signatures (html/pe/sfx), which tend to match quite often. But you already speed up the situation a lot, if you are able to determine in hardware that software only needs to match with a trie that has 4-5 patterns. Of course those tries should be prebuilt. Also patterns that are part of logical signatures need special treatment (you need to count how many times the sub-signatures matched). > I have gone through the function reload_db. It is > first freeing the existing signatures (cl_free) & > then loading the new signatures ? which code path > should I follow to understand that old signatures > are not released till the last thread finishes it's processing ? > cl_engine_free only drops reference count. When refcount is zero, then it is freed, otherwise it isn't. Best regards, --Edwin _______________________________________________ http://lurker.clamav.net/list/clamav-devel.html Please submit your patches to our Bugzilla: http://bugs.clamav.net