On 05/18/2010 09:09 PM, Mohammed Al-Saleh wrote: > Hi Edwin, > > On Apr 27, 2010, at 7:19 AM, Török Edwin wrote: > >> On 04/26/2010 10:20 PM, Mohammed Al-Saleh wrote: >>> Hi Edwin, >>> >>> Thanks for your reply. >>> I need to know the cases where ClamAV has performance bottlenecks or issues. >> >> The best way to do that is by measuring it. >> Read the last part of this reply: >> http://lurker.clamav.net/message/20081204.212941.c9fa45c2.en.html >> >>> What kind of texts that could make ClamAV takes more time than usual. >> >> That question is hard to answer, since the signatures change each day, >> thus the AC trie changes, the prefiltering patterns change ... >> >>> Aho-Corasick and Boyer-Moore might have some situations that cause >>> performance issue. >> >> There is also a prefiltering step now. >> You can search bugzilla on why it was introduced. >> >>> I might consider doing improvements or study performance impact. >> >> Don't expect it to be easy to make improvements. >> >> I spent quite a lot of time on the prefiltering step, and the problem is >> that some signatures falsely match a lot of times (like 'PE' from the PE >> signature), but the entire signature usually doesn't. >> So ClamAV has to stop the trie lookup, test the match, continue the trie >> lookup lots of times. > > My understanding (please correct me if I am wrong) is that the first step in > matching (let's ignore the filetype recognition and such) is the prefiltering > step. > If the filter matches then further matching (using either AC or BM) is needed > to make sure that it is not a false positive because the filter could contain > more patterns than it should (and the filter matches at most 8 characters of > the original signature so the other parts might not match).
Yes. > I am not sure if I understand your point here and I really want to understand > it: > "So ClamAV has to stop the trie lookup, test the match, continue the trie > lookup lots of times." > Can you please explain this to me more? > If the filter matches but AC or BM does not, would we return back to the > filter to continue from the point it matches? No, I was refering to how AC works. After the AC trie detects a match it needs to check it, the AC trie contains only a tiny part of the entire signature (up to ac_max_depth), and the trie itself doesn't contain wildcards etc. Best regards, --Edwin _______________________________________________ http://lurker.clamav.net/list/clamav-devel.html Please submit your patches to our Bugzilla: http://bugs.clamav.net