Thomas Graf wrote:
* Pablo Neira <[EMAIL PROTECTED]> 2005-08-20 16:35
Attached the implementation of the Boyer-Moore[1] string search
algorithm for brand new textsearch infrastructure.
Are you aware that this implementation is only correct under
the assumption that a pattern is never spread over mutliple
blocks? You'll have to add a buffer of patlen size and
continously fill it with data from get_next_block() to make
sure it works for non-linear data.
I'm not saying you have to add this but it must be documented
somewhere. This algorithm is still very useful if comparisons
of non-linear data is not a requirement.
Yes, aware of such limitation. This will be a drawback we'll have to
live with until I come up with the variant to look for matches on the
borders. I've added a note to the BM implementation to document this
issue, see the patch below, applies on top of the previous posted patch.
@Thomas: I did some little benchmarks[2] to compare Knuth-Pratt-Morris
vs Boyer-Moore. I've used the iptables string match to add rules for the
packet classification.
Thank you, it would be interesting to see how they compare once
you've either added the above mentioned buffer or use a naive
or kmp match at the borders of each block.
Agreed, I'll benchmark BM vs KMP once again once BM implements that
thing you've described above and pass you the results.
Signed-off-by: Pablo Neira Ayuso <[EMAIL PROTECTED]>
--
Pablo
Index: netfilter-2.6.14/lib/ts_bm.c
===================================================================
--- netfilter-2.6.14.orig/lib/ts_bm.c 2005-08-20 17:10:03.000000000 +0200
+++ netfilter-2.6.14/lib/ts_bm.c 2005-08-20 17:19:47.000000000 +0200
@@ -19,6 +19,19 @@
*
* [2] Handbook of Exact String Matching Algorithms, Thierry Lecroq, 2004
* http://www-igm.univ-mlv.fr/~lecroq/string/string.pdf
+ *
+ * Note: Since Boyer-Moore (BM) performs searches for matchings from left
+ * to right, it's still possible that a matching could be spread over
+ * multiple blocks, in that case this algorithm won't find any coincidence.
+ *
+ * If your willing to ensure that such thing won't ever happen, use the
+ * Knuth-Pratt-Morris (KMP) implementation instead. So, choose the proper
+ * string search algorithm depending on your setting. Say you're using the
+ * textsearch infrastructure for filtering, NIDS or any similar security
+ * focused purpose, then go KMP. Otherwise, if you really care about
+ * performance, say you're classifying packets to apply Quality of
+ * Service (QoS), load balacing,... policies, and you don't mind about
+ * possible matchings spread over multiple fragments then go BM.
*/
#include <linux/config.h>