Hey guys,

Thanks for comments!

I got something interesting, measurements and graphs:

& I add my comment:
"Intersections where Boyer-Moore improves performance and where
Aho-Corasick gets slower is
shifting to the right as amount of patterns is increasing (together
with file size)."

I am looking forward for feedback, especially from Edwin, was the
measurements correct? Do results conform to your thoughts/expectation
about those algorithms?
GiM it seems that BM is efficient for same cases, see link above.
(BTW,Thanks for brief history of algorithms implemented in previous
CLAMAV versions)

My second comment:

"The measurements were performed 3 times, For 3 trials per measurement
a small difference can occur. This small deviation is in around ~ 30
ms max and can be neglected. Average value from 3 trials has been
Very amazing results can be observed. For small number of signatures
(20, 100, 200 signatures in database) Aho-Corasick is much better than
Boyer-Moore, To scan 2MB file it takes 2 times longer for Boyer-Moore
algorithm than Aho-Corasick. But for more than 10.000 signatures in
database for the same size of file Boyer-Moore make up for lost time.
Still for larger files ( with size greater than 3 MB) for the same
signature database Aho-Corasick is better."


On Sat, Dec 20, 2008 at 7:36 PM, GiM <g...@skrzynka.pl> wrote:
> Thomasz Blaszczyk in message 'Re: [Clamav-devel] clamAV scanning algorithm' 
> wrote:
>> >
>> > if you switch ClamAV to use only AC, you'll notice a significant
>> > performance improvement, at the expense of increased memory usage for
>> > the DB.
>> Right, AC trees are quite large and takes lot of memory..
>> So BM is only used to save memory? I guess, it was implemented first
>> and some people still feel sentiment to this algorithm..:)
>> Since AC works faster and handles wildcards...
> If I remember correctly it was just on the contrary,
> AC was implemented first (version I recall was 0.67,
> I haven't looked at a code of earlier versions except
> some truly historical stuff like 0.11 or 0.15)
> and later because of memory problems Edwin mentioned
> at least a few times BM was added to resolve those
> issues.
>  cheers,
> --
>  main(int a[puts("Michal 'GiM' Spadlinski")]){}

"Stay Hungry. Stay Foolish." Steve Jobs
Please submit your patches to our Bugzilla: http://bugs.clamav.net

Reply via email to