Re: [PATCH] Boyer-Moore extension for textsearch infrastructure

Pablo Neira Sat, 20 Aug 2005 08:22:26 -0700

Thomas Graf wrote:

* Pablo Neira <[EMAIL PROTECTED]> 2005-08-20 16:35

Attached the implementation of the Boyer-Moore[1] string searchalgorithm for brand new textsearch infrastructure.



Are you aware that this implementation is only correct under
the assumption that a pattern is never spread over mutliple
blocks? You'll have to add a buffer of patlen size and
continously fill it with data from get_next_block() to make
sure it works for non-linear data.

I'm not saying you have to add this but it must be documented
somewhere. This algorithm is still very useful if comparisons
of non-linear data is not a requirement.

Yes, aware of such limitation. This will be a drawback we'll have tolive with until I come up with the variant to look for matches on theborders. I've added a note to the BM implementation to document thisissue, see the patch below, applies on top of the previous posted patch.

@Thomas: I did some little benchmarks[2] to compare Knuth-Pratt-Morrisvs Boyer-Moore. I've used the iptables string match to add rules for thepacket classification.
Thank you, it would be interesting to see how they compare once
you've either added the above mentioned buffer or use a naive
or kmp match at the borders of each block.

Agreed, I'll benchmark BM vs KMP once again once BM implements thatthing you've described above and pass you the results.


Signed-off-by: Pablo Neira Ayuso <[EMAIL PROTECTED]>

--
Pablo

Index: netfilter-2.6.14/lib/ts_bm.c
===================================================================
--- netfilter-2.6.14.orig/lib/ts_bm.c	2005-08-20 17:10:03.000000000 +0200
+++ netfilter-2.6.14/lib/ts_bm.c	2005-08-20 17:19:47.000000000 +0200
@@ -19,6 +19,19 @@
  *
  *   [2] Handbook of Exact String Matching Algorithms, Thierry Lecroq, 2004
  *       http://www-igm.univ-mlv.fr/~lecroq/string/string.pdf
+ *
+ *   Note: Since Boyer-Moore (BM) performs searches for matchings from left 
+ *   to right, it's still possible that a matching could be spread over 
+ *   multiple blocks, in that case this algorithm won't find any coincidence.
+ *   
+ *   If your willing to ensure that such thing won't ever happen, use the
+ *   Knuth-Pratt-Morris (KMP) implementation instead. So, choose the proper 
+ *   string search algorithm depending on your setting. Say you're using the 
+ *   textsearch infrastructure for filtering, NIDS or any similar security
+ *   focused purpose, then go KMP. Otherwise, if you really care about 
+ *   performance, say you're classifying packets to apply Quality of 
+ *   Service (QoS), load balacing,... policies, and you don't mind about
+ *   possible matchings spread over multiple fragments then go BM.
  */
 
 #include <linux/config.h>

Re: [PATCH] Boyer-Moore extension for textsearch infrastructure

Reply via email to