On 05/18/2010 09:09 PM, Mohammed Al-Saleh wrote:
> Hi Edwin,
> 
> On Apr 27, 2010, at 7:19 AM, Török Edwin wrote:
> 
>> On 04/26/2010 10:20 PM, Mohammed Al-Saleh wrote:
>>> Hi Edwin,
>>>
>>> Thanks for your reply.
>>> I need to know the cases where ClamAV has performance bottlenecks or issues.
>>
>> The best way to do that is by measuring it.
>> Read the last part of this reply:
>> http://lurker.clamav.net/message/20081204.212941.c9fa45c2.en.html
>>
>>> What kind of texts that could make ClamAV takes more time than usual. 
>>
>> That question is hard to answer, since the signatures change each day,
>> thus the AC trie changes, the prefiltering patterns change ...
>>
>>> Aho-Corasick and Boyer-Moore might have some situations that cause 
>>> performance issue.
>>
>> There is also a prefiltering step now.
>> You can search bugzilla on why it was introduced.
>>
>>> I might consider doing improvements or study performance impact.
>>
>> Don't expect it to be easy to make improvements.
>>
>> I spent quite a lot of time on the prefiltering step, and the problem is
>> that some signatures falsely match a lot of times (like 'PE' from the PE
>> signature), but the entire signature usually doesn't.
>> So ClamAV has to stop the trie lookup, test the match, continue the trie
>> lookup lots of times.
> 
> My understanding (please correct me if I am wrong) is that the first step in 
> matching (let's ignore the filetype recognition and such) is the prefiltering 
> step.
> If the filter matches then further matching (using either AC or BM) is needed 
> to make sure that it is not a false positive because the filter could contain 
> more patterns than it should (and the filter matches at most 8 characters of 
> the original signature so the other parts might not match).

Yes.

> I am not sure if I understand your point here and I really want to understand 
> it:
> "So ClamAV has to stop the trie lookup, test the match, continue the trie 
> lookup lots of times."
> Can you please explain this to me more?
> If the filter matches but AC or BM does not, would we return back to the 
> filter to continue from the point it matches?

No, I was refering to how AC works.

After the AC trie detects a match it needs to check it, the AC trie
contains only a tiny part of the entire signature (up to ac_max_depth),
and the trie itself doesn't contain wildcards etc.

Best regards,
--Edwin
_______________________________________________
http://lurker.clamav.net/list/clamav-devel.html
Please submit your patches to our Bugzilla: http://bugs.clamav.net

Reply via email to