One 8/28/2011 6:12 AM, Stevan Bajic wrote:
Should be fixed in GIT
Further testing found another code path that leads to a memory leak. I didn't realize these lines were inside a for loop the first time around; and without the loop a free call wouldn't be needed.

Most people use the Apache SpamAssassin corpi for testing or the TREC corpi.
I was wondering if there was an actively maintained corpus available. Something like compressionratings.com -- but for email classification. Something I could test my configuration/build against to see where it ranks.

You could use dspam_train and use dspam_stats to set/reset the snapshot.

Yeah! But I'll need a script or tool to automate the testing process? I use the library API so I'm not a good person to write such a test script.


I don't understand what you mean with this? Are you trying to get a
certain score/result that you can compare with the other DSPAM
users/developers?

Exactly! If I run my build against an identical corpus I should get identical results! If the results vary, I know its time look for bugs. The goal is to catch a string function or memory allocator that behaves differently. I can always decide the deviation is small enough to ignore... but only if I have results for comparison.

I don't know how other benchmark their setup (and if they even do
benchmark their setup)? I myself have developed over the years my own
testing and training method. I don't use stock DSPAM methods at all. I
guess other DSPAM users/admins have established their own test and
training procedures as well.

I was hoping to find the tools/scripts/notes to test my build/implementation. It would be nice if those tools became part of 'make check'. If the testing bits take up too much space, just distribute them as a separate tarball. The libxml2 project uses that strategy. Each release tarball is paired with its own test tarball. Check out ftp://xmlsoft.org/libxml2/ for the files.

This is difficult since the backend is configurable with ./configure but
it is most likely not initialized and a 'make check' would require to
have a properly configured backend (with all the schema and access
already setup), which is not available on a fresh/new setup during
compile time.

A good start might be to compile the command line utilities and/or test programs using with the file system storage driver. If the check variants are compiled in response to 'make check' and stored inside the test folder they shouldn't cause any problems. Then its just a matter of automating the test process. And if the results are stored under the build tree they could be purged easily pruged with 'make clean'. ClamAV ships with a test corpus and 'make check' will test the corpus against the command line tools. It checks whether a reasonable amount of memory was needed; that the program finished quickly and most importantly that it generates the expected classification.

This strategy could be used to test libdspam and could allow limited testing of the command line utilities. IMO thats the most important chunk of code.

When time allows; adding logic to test different storage configurations shouldn't be possible. Just write the check script with the assumption a valid test database available. If the dspam user won't connect to the localhost using the password 'bajic' then 'make check' simply fails.

If you wanted to get a little more complicated, try executing the RDBMS binary against a localized config file. Then initialize your blank database schema and listen for connections via a file socket or named pipe. Since the database files are stored inside the build tree, they can be pruged and recreated each time 'make check 'is called. Checkout the MySQL tarball and run "./configure; make && make check" for the details.

P.S. If anyone else decides to test DSPAM using Valgrind, the current release (3.6.1) will complain about glibc str functions reading dirty memory via aligned reads. The issue is fixed in the valgrind code repository -- for those willing/able to compile a 3.7.0 snapshot.
--- decode.c
+++ decode.c 
@@ -491,9 +493,13 @@
           free(header->concatenated_data);
           header->concatenated_data = decoded;
         }
-        else if (was_null) {
-          header->original_data = NULL;
-        }
+        else if (was_null && header->original_data) {
+                                       free(header->original_data);
+                                       header->original_data = NULL;
+                               }
+                               else if (was_null) {
+                                       header->original_data = NULL;
+                               }
       }
     }
 
------------------------------------------------------------------------------
Special Offer -- Download ArcSight Logger for FREE!
Finally, a world-class log management solution at an even better 
price-free! And you'll get a free "Love Thy Logs" t-shirt when you
download Logger. Secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsisghtdev2dev
_______________________________________________
Dspam-devel mailing list
Dspam-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-devel

Reply via email to