Bug Tracker item #2826644, was opened at 2009-07-24 16:50 Message generated for change (Comment added) made by csmr You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1126467&aid=2826644&group_id=250683
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: daemon Group: v3.9.0 Status: Open Resolution: None Priority: 9 Private: No Submitted By: Carlo Rodrigues (csmr) Assigned to: Stevan Bajic (sbajic) Summary: dspam crashes when training with signature Initial Comment: There was one message that I could not train via web-ui and always got an error. I checked on dspam.debug for the problem. $ /usr/bin/dspam --source=error --class=innocent --signature=5,4a64ad79114921676967107 --user ad...@net4b.pt *** glibc detected *** /usr/bin/dspam: double free or corruption (out): 0x00007fffec7d2740 *** Attached is the output of valgrind -v --show-reachable=yes. If there is anything else needed, maybe the tokens, just ask. Cheers, Carlo Rodrigues ---------------------------------------------------------------------- >Comment By: Carlo Rodrigues (csmr) Date: 2009-07-24 23:52 Message: > BINGO! Let's talk in a constructive way about that issue. Okay? Message is > tagged as Virus and DSPAM does NOT learn the message (it has delegated the > control to ClamAV and ClamAV said it is an Virus so no need for DSPAM to > learn anything about it). So no tokens are available. But the signature is > written to the database and just holding an uninitialized token. > > Assuming now that the classification from ClamAV was wrong and we want to > relearn the message > > Scenario 1: > To be able to reclassify the message we would need to instruct DSPAM to > tokenzie the message and add the signature regardless of what ClamAV said. > Without that we are not able to learn anything (since we just have an empty > token). > > Scenario 2: > We allow reclassification BUT we just switch tags in log and DON'T do any > training. So it is more or less a visual cosmetics but nothing really > processed. > > I personally would go for 2 since we handed out control to ClamAV and if > ClamAV thinks it is a virus then we save us time by not tokenizing the > message and just flag the message as Virus. If now a reclassification > happens then we just capture that the token is empty and protocol the > reclassification but don't do any real training. > > What do you think? > Personally I see no need of tokenizing messages that are identified as viruses. If it isn't a virus and it was wrongly identified as being so by ClamAV (It happened to me twice before, when testing dspam on a production domain) and it is quarantined, then checking the message on the quarantine panel and hitting "Deliver Checked" gets the job done. There really makes no sense trying to (re)learn the message as ham, because next time the same message or attachment is sent, ClamAV will catch it again. DSPAM will lever let it through as long as ClamAV is there. Scenario 2 looks OK to me. Maybe a bit confusing for the user, but no harm can come from it. Scenario 3 -> Not presenting the chance of retraining messages identified as viruses. No checkbox, no "As Innocent" text. I think this option would be the best one. But if it takes a lot of effort to do this, and someone is working on a new web-ui, better choose scenario 2 for now. > btw: That think should have not crashed! I have to crawl the code and > capture and fix that issue anyway. > At least this time it crashes on your side as well :) > btw2: Good that you found that problem. > > > Kind Regards from Switzerland > > Stevan Bajc > > Cheers, Carlo Rodrigues ---------------------------------------------------------------------- Comment By: Stevan Bajic (sbajic) Date: 2009-07-24 21:16 Message: Never seen that before but now I went and recompiled DSPAM with ClamAV and redit the test: ------ theia vuadmin # dspam --process --user ste...@bajic.ch --deliver=summary < teste_virus.eml X-DSPAM-Result: ste...@bajic.ch; result="Spam"; class="Virus"; probability=1.0000; confidence=1.00; signature=4,4a6a0faa21721695616434 theia vuadmin # mysql --user=root --password=$(cat /mnt/gentoo.scripts/mysql.pwd) --batch -e "select uid,signature,hex(data),length,created_on from sysdb_dspam.dspam_signature_data where signature='4,4a6a0faa21721695616434'\G" *************************** 1. row *************************** uid: 4 signature: 4,4a6a0faa21721695616434 hex(data): 0000000000000000 length: 8 created_on: 2009-07-24 theia vuadmin # ------ Okay. Now let's do the reclassification: ----- theia vuadmin # dspam --class=innocent --source=error --signature=4,4a6a0faa21721695616434 --user ste...@bajic.ch *** glibc detected *** dspam: free(): invalid pointer: 0xbfe3e5ac *** ======= Backtrace: ========= /lib/libc.so.6[0xb7ed0fe1] /lib/libc.so.6[0xb7ed270a] /lib/libc.so.6(cfree+0x6d)[0xb7ed576d] /usr/lib/libdspam.so.7(_ds_operate+0x6b0)[0xb7fccf77] /usr/lib/libdspam.so.7(dspam_process+0x2c9)[0xb7fcd757] dspam(retrain_message+0x197)[0x804e8b5] dspam(process_message+0x282)[0x80529d4] dspam(process_users+0xd3a)[0x8054c8c] dspam(main+0x4a3)[0x8055b16] /lib/libc.so.6(__libc_start_main+0xe6)[0xb7e7ba66] dspam[0x804b971] ======= Memory map: ======== 08048000-08060000 r-xp 00000000 fd:04 619760 /usr/bin/dspam 08060000-08061000 r--p 00017000 fd:04 619760 /usr/bin/dspam 08061000-08062000 rw-p 00018000 fd:04 619760 /usr/bin/dspam 08062000-080d0000 rw-p 00000000 00:00 0 [heap] b7a00000-b7a21000 rw-p 00000000 00:00 0 b7a21000-b7b00000 ---p 00000000 00:00 0 b7bd1000-b7beb000 r-xp 00000000 fd:04 4662652 /usr/lib/gcc/i686-pc-linux-gnu/4.4.0/libgcc_s.so.1 b7beb000-b7bec000 r--p 00019000 fd:04 4662652 /usr/lib/gcc/i686-pc-linux-gnu/4.4.0/libgcc_s.so.1 b7bec000-b7bed000 rw-p 0001a000 fd:04 4662652 /usr/lib/gcc/i686-pc-linux-gnu/4.4.0/libgcc_s.so.1 b7bed000-b7c00000 r-xp 00000000 09:02 739 /lib/libz.so.1.2.3 b7c00000-b7c01000 r--p 00012000 09:02 739 /lib/libz.so.1.2.3 b7c01000-b7c02000 rw-p 00013000 09:02 739 /lib/libz.so.1.2.3 b7c02000-b7dd1000 r-xp 00000000 fd:04 13044778 /usr/lib/mysql/libmysqlclient.so.16.0.0 b7dd1000-b7dd5000 r--p 001ce000 fd:04 13044778 /usr/lib/mysql/libmysqlclient.so.16.0.0 b7dd5000-b7e1c000 rw-p 001d2000 fd:04 13044778 /usr/lib/mysql/libmysqlclient.so.16.0.0 b7e1c000-b7e1d000 rw-p 00000000 00:00 0 b7e1d000-b7e2b000 r-xp 00000000 fd:04 30497984 /usr/lib/dspam/libmysql_drv.so.7.0.0 b7e2b000-b7e2c000 r--p 0000d000 fd:04 30497984 /usr/lib/dspam/libmysql_drv.so.7.0.0 b7e2c000-b7e2d000 rw-p 0000e000 fd:04 30497984 /usr/lib/dspam/libmysql_drv.so.7.0.0 b7e2d000-b7e37000 r-xp 00000000 09:02 14910 /lib/libnss_files-2.10.1.so b7e37000-b7e38000 r--p 00009000 09:02 14910 /lib/libnss_files-2.10.1.so b7e38000-b7e39000 rw-p 0000a000 09:02 14910 /lib/libnss_files-2.10.1.so b7e39000-b7e3b000 rw-p 00000000 00:00 0 b7e3b000-b7e3d000 r-xp 00000000 09:02 20856 /lib/libdl-2.10.1.so b7e3d000-b7e3e000 r--p 00001000 09:02 20856 /lib/libdl-2.10.1.so b7e3e000-b7e3f000 rw-p 00002000 09:02 20856 /lib/libdl-2.10.1.so b7e3f000-b7e63000 r-xp 00000000 09:02 14902 /lib/libm-2.10.1.so b7e63000-b7e64000 r--p 00023000 09:02 14902 /lib/libm-2.10.1.so b7e64000-b7e65000 rw-p 00024000 09:02 14902 /lib/libm-2.10.1.so b7e65000-b7fa6000 r-xp 00000000 09:02 20854 /lib/libc-2.10.1.so b7fa6000-b7fa8000 r--p 00141000 09:02 20854 /lib/libc-2.10.1.so b7fa8000-b7fa9000 rw-p 00143000 09:02 20854 /lib/libc-2.10.1.so b7fa9000-b7fac000 rw-p 00000000 00:00 0 b7fac000-b7fc0000 r-xp 00000000 09:02 20858 /lib/libpthread-2.10.1.so b7fc0000-b7fc1000 r--p 00014000 09:02 20858 /lib/libpthread-2.10.1.so b7fc1000-b7fc2000 rw-p 00015000 09:02 20858 /lib/libpthread-2.10.1.so b7fc2000-b7fc4000 rw-p 00000000 00:00 0 b7fc4000-b7fdb000 r-xp 00000000 fd:04 29774232 /usr/lib/libdspam.so.7.0.0 b7fdb000-b7fdc000 r--p 00016000 fd:04 29774232 /usr/lib/libdspam.so.7.0.0 b7fdc000-b7fdd000 rw-p 00017000 fd:04 29774232 /usr/lib/libdspam.so.7.0.0 b7fdd000-b7fde000 rw-p 00000000 00:00 0 b7fe6000-b8002000 r-xp 00000000 09:02 16713 /lib/ld-2.10.1.so b8002000-b8003000 r--p 0001b000 09:02 16713 /lib/ld-2.10.1.so b8003000-b8004000 rw-p 0001c000 09:02 16713 /lib/ld-2.10.1.so bfe2b000-bfe40000 rw-p 00000000 00:00 0 [stack] ffffe000-fffff000 r-xp 00000000 00:00 0 [vdso] Aborted theia vuadmin # ----- BINGO! Let's talk in a constructive way about that issue. Okay? Message is tagged as Virus and DSPAM does NOT learn the message (it has delegated the control to ClamAV and ClamAV said it is an Virus so no need for DSPAM to learn anything about it). So no tokens are available. But the signature is written to the database and just holding an uninitialized token. Assuming now that the classification from ClamAV was wrong and we want to relearn the message Scenario 1: To be able to reclassify the message we would need to instruct DSPAM to tokenzie the message and add the signature regardless of what ClamAV said. Without that we are not able to learn anything (since we just have an empty token). Scenario 2: We allow reclassification BUT we just switch tags in log and DON'T do any training. So it is more or less a visual cosmetics but nothing really processed. I personally would go for 2 since we handed out control to ClamAV and if ClamAV thinks it is a virus then we save us time by not tokenizing the message and just flag the message as Virus. If now a reclassification happens then we just capture that the token is empty and protocol the reclassification but don't do any real training. What do you think? btw: That think should have not crashed! I have to crawl the code and capture and fix that issue anyway. btw2: Good that you found that problem. Kind Regards from Switzerland Stevan Bajc ---------------------------------------------------------------------- Comment By: Carlo Rodrigues (csmr) Date: 2009-07-24 19:29 Message: What I found out, by looking at the data field for that signature on dspam_signature_data is that it is stored as "\0\0\0\0\0\0\0\0" (eight null chars). I tried retraining with another signature on that condition, and it crashed as well. Do you have some cases like this, where the data is 8 nulls? I found 26 cases in my db, but I can't say they were all the result of a virus message scanned by clamav or not, because I wiped my system.log. I'll train my db from scratch again and try to find these kind of cases,I'll retrain them, and I'll post the results. Cheers, Carlo Rodrigues ---------------------------------------------------------------------- Comment By: Carlo Rodrigues (csmr) Date: 2009-07-24 19:12 Message: If I disable clamav scanning, then (my) dspam treats the original message as innocent, and retrains it without any problem... ---------------------------------------------------------------------- Comment By: Stevan Bajic (sbajic) Date: 2009-07-24 18:51 Message: Does not crash here :( I have however latest GIT release and GCC 4.4.0 and glibc 2.10.1: ----- theia vuadmin # /lib/libc.so.6 GNU C Library stable release version 2.10.1, by Roland McGrath et al. Copyright (C) 2009 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled by GNU CC version 4.4.0. Compiled on a Linux >>2.6.30<< system on 2009-07-07. Available extensions: C stubs add-on version 2.1.2 crypt add-on version 2.1 by Michael Glad and others Gentoo patchset 1 GNU Libidn by Simon Josefsson Native POSIX Threads Library by Ulrich Drepper et al BIND-8.2.3-T5B For bug reporting instructions, please see: <http://www.gnu.org/software/libc/bugs.html>. theia vuadmin # gcc -v Using built-in specs. Target: i686-pc-linux-gnu Configured with: /var/tmp/portage/sys-devel/gcc-4.4.0-r1/work/gcc-4.4.0/configure --prefix=/usr --bindir=/usr/i686-pc-linux-gnu/gcc-bin/4.4.0 --includedir=/usr/lib/gcc/i686-pc-linux-gnu/4.4.0/include --datadir=/usr/share/gcc-data/i686-pc-linux-gnu/4.4.0 --mandir=/usr/share/gcc-data/i686-pc-linux-gnu/4.4.0/man --infodir=/usr/share/gcc-data/i686-pc-linux-gnu/4.4.0/info --with-gxx-include-dir=/usr/lib/gcc/i686-pc-linux-gnu/4.4.0/include/g++-v4 --host=i686-pc-linux-gnu --build=i686-pc-linux-gnu --disable-altivec --disable-fixed-point --with-ppl --with-cloog --enable-nls --without-included-gettext --with-system-zlib --disable-checking --disable-werror --enable-secureplt --disable-multilib --enable-libmudflap --disable-libssp --enable-libgomp --enable-cld --disable-libgcj --with-arch=i686 --enable-languages=c,c++,fortran --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --with-bugurl=http://bugs.gentoo.org/ --with-pkgversion='Gentoo 4.4.0-r1 p1.2' Thread model: posix gcc version 4.4.0 (Gentoo 4.4.0-r1 p1.2) theia vuadmin # mysql -V mysql Ver 14.14 Distrib 5.1.36, for pc-linux-gnu (i686) using readline 5.1 theia vuadmin # ----- When doing that command on my setup: ----- theia vuadmin # wget "https://sourceforge.net/tracker/download.php?group_id=250683&atid=1126467&file_id=33649&aid=2826644" --2009-07-24 19:41:17-- https://sourceforge.net/tracker/download.php?group_id=250683&atid=1126467&file_id=36429&aid=2826644 Resolving sourceforge.net... 216.34.181.60 Connecting to sourceforge.net|216.34.181.60|:443... connected. HTTP request sent, awaiting response... 302 Found Location: http://sourceforge.net/tracker/download.php?group_id=250683&atid=1126467&file_id=336429&aid=282664 [following] --2009-07-24 19:41:18-- http://sourceforge.net/tracker/download.php?group_id=250683&atid=1126467&file_id=33429&aid=2826644 Connecting to sourceforge.net|216.34.181.60|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 1203 (1.2K) [application/x-mimearchive] Saving to: `download.php?group_id=250683&atid=1126467&file_id=336429&aid=2826644' 100%[===================================================================>] 1,203 --.-K/s in 0s 2009-07-24 19:41:18 (51.3 MB/s) - `download.php?group_id=250683&atid=1126467&file_id=336429&aid=2826644' savd [1203/1203] theia vuadmin # mv download.php\?group_id\=250683\&atid\=1126467\&file_id\=336429\&aid\=2826644 teste_virus.ml theia vuadmin # dspam --process --user ste...@bajic.ch --deliver=summary < teste_virus.eml X-DSPAM-Result: ste...@bajic.ch; result="Spam"; class="Spam"; probability=1.0000; confidence=0.70; signature4,4a69f2a0151372646267052 theia vuadmin # dspam --class=innocent --source=error --signature=4,4a69f2a0151372646267052 --user ste...@bajic.ch theia vuadmin # tail -n 4 /var/spool/dspam/data/s/t/stev...@bajic.ch/stev...@bajic.ch.log 1248457376 S Carlo Rodrigues <c...@net4b.pt> 4,4a69f2a0151372646267052 teste de virus (signature) Tagged <47a9ca7a.7080...@net4b.pt> 1248457486 F <None Specified> 4,4a69f2a0151372646267052 <None Specified> Retrained theia vuadmin # ----- BUT I do not use ClamAV inside DSPAM. Maybe that is an problem? Could you (if possible) try once without having ClamAV active (just disable it in dspam.conf) inside DSPAM? If that still crashes, then please try once if you could compile DSPAM without ClamAV and then rerun and see if that sill crashes your DSPAM? Kind Regards from Switzerland Stevan Bajic ---------------------------------------------------------------------- Comment By: Carlo Rodrigues (csmr) Date: 2009-07-24 18:30 Message: Hello, Stevan. I'm including the gdb output, as well as the offending message. I'm using ClamAV, for the virus scanning. Yes, it is reproducible. Just do the following: $ dspam --process --user ad...@net4b.pt --deliver=summary < teste_virus.eml X-DSPAM-Result: ad...@net4b.pt; result="Spam"; class="Virus"; probability=1.0000; confidence=1.00; signature=5,4a69e365308598763220631 $ dspam --class=innocent --source=error --signature=5,4a69e365308598763220631 --user ad...@net4b.pt *** glibc detected *** dspam: double free or corruption (out): 0x00007fffe996fc00 *** Cheers, Carlo Rodrigues ---------------------------------------------------------------------- Comment By: Stevan Bajic (sbajic) Date: 2009-07-24 17:02 Message: Hallo Carlos A GDB trace would be more useful in this case. Can you make one and attach it here? Is the error reproducible? If so then could you send me the message or attach it here? Kind Regards from Switzerland Stevan Bajic ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=1126467&aid=2826644&group_id=250683 ------------------------------------------------------------------------------ _______________________________________________ Dspam-devel mailing list Dspam-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspam-devel