Bug Tracker item #2826644, was opened at 2009-07-24 17:50
Message generated for change (Settings changed) made by sbajic
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=1126467&aid=2826644&group_id=250683

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: daemon
Group: v3.9.0
>Status: Closed
>Resolution: Fixed
Priority: 9
Private: No
Submitted By: Carlo Rodrigues (csmr)
Assigned to: Stevan Bajic (sbajic)
Summary: dspam crashes when training with signature

Initial Comment:
There was one message that I could not train via web-ui and always got an error.
I checked on dspam.debug for the problem.

$ /usr/bin/dspam --source=error --class=innocent 
--signature=5,4a64ad79114921676967107 --user ad...@net4b.pt
*** glibc detected *** /usr/bin/dspam: double free or corruption (out): 
0x00007fffec7d2740 *** 

Attached is the output of valgrind -v --show-reachable=yes.
If there is anything else needed, maybe the tokens, just ask.

Cheers,
Carlo Rodrigues

----------------------------------------------------------------------

>Comment By: Stevan Bajic (sbajic)
Date: 2009-07-25 13:50

Message:
Hallo Carlo

Please check out GIT commit 9a6cdaba9caa1b951ccce77382262da7964a3f83 and
post any success or failure on the ML or reopen this bug here if needed.

Kind Regards from Switzerland

Stevan Bajic

----------------------------------------------------------------------

Comment By: Carlo Rodrigues (csmr)
Date: 2009-07-25 00:52

Message:

> BINGO! Let's talk in a constructive way about that issue. Okay? Message
is
> tagged as Virus and DSPAM does NOT learn the message (it has delegated
the
> control to ClamAV and ClamAV said it is an Virus so no need for DSPAM
to
> learn anything about it). So no tokens are available. But the signature
is
> written to the database and just holding an uninitialized token.
>
> Assuming now that the classification from ClamAV was wrong and we want
to
> relearn the message
>
> Scenario 1:
> To be able to reclassify the message we would need to instruct DSPAM to
> tokenzie the message and add the signature regardless of what ClamAV
said.
> Without that we are not able to learn anything (since we just have an
empty
> token).
>
> Scenario 2:
> We allow reclassification BUT we just switch tags in log and DON'T do
any
> training. So it is more or less a visual cosmetics but nothing really
> processed.
>
> I personally would go for 2 since we handed out control to ClamAV and
if
> ClamAV thinks it is a virus then we save us time by not tokenizing the
> message and just flag the message as Virus. If now a reclassification
> happens then we just capture that the token is empty and protocol the
> reclassification but don't do any real training.
>
> What do you think?
>

Personally I see no need of tokenizing messages that are identified as
viruses.
If it isn't a virus and it was wrongly identified as being so by ClamAV
(It happened to me twice before, when testing dspam on a production domain)
and it is quarantined, then checking the message on the quarantine panel
and hitting "Deliver Checked" gets the job done. There really makes no
sense trying to (re)learn the message as ham, because next time the same
message or attachment is sent, ClamAV will catch it again. DSPAM will lever
let it through  as long as ClamAV is there.

Scenario 2 looks OK to me. Maybe a bit confusing for the user, but no harm
can come from it.

Scenario 3 -> Not presenting the chance of retraining messages identified
as viruses. No checkbox, no "As Innocent" text.
I think this option would be the best one. But if it takes a lot of effort
to do this, and someone is working on a new web-ui, better choose scenario
2 for now.

> btw: That think should have not crashed! I have to crawl the code and
> capture and fix that issue anyway.
>

At least this time it crashes on your side as well :)

> btw2: Good that you found that problem.
>
>
> Kind Regards from Switzerland
>
> Stevan Bajc
>
>
Cheers,

Carlo Rodrigues

----------------------------------------------------------------------

Comment By: Stevan Bajic (sbajic)
Date: 2009-07-24 22:16

Message:
Never seen that before but now I went and recompiled DSPAM with ClamAV and
redit the test:
------
theia vuadmin # dspam --process --user ste...@bajic.ch --deliver=summary <
teste_virus.eml
X-DSPAM-Result: ste...@bajic.ch; result="Spam"; class="Virus";
probability=1.0000; confidence=1.00; signature=4,4a6a0faa21721695616434
theia vuadmin # mysql --user=root --password=$(cat
/mnt/gentoo.scripts/mysql.pwd) --batch -e "select
uid,signature,hex(data),length,created_on from
sysdb_dspam.dspam_signature_data where
signature='4,4a6a0faa21721695616434'\G"
*************************** 1. row ***************************
       uid: 4
 signature: 4,4a6a0faa21721695616434
 hex(data): 0000000000000000
    length: 8
created_on: 2009-07-24
theia vuadmin #
------

Okay. Now let's do the reclassification:
-----
theia vuadmin # dspam --class=innocent --source=error
--signature=4,4a6a0faa21721695616434 --user ste...@bajic.ch
*** glibc detected *** dspam: free(): invalid pointer: 0xbfe3e5ac ***
======= Backtrace: =========
/lib/libc.so.6[0xb7ed0fe1]
/lib/libc.so.6[0xb7ed270a]
/lib/libc.so.6(cfree+0x6d)[0xb7ed576d]
/usr/lib/libdspam.so.7(_ds_operate+0x6b0)[0xb7fccf77]
/usr/lib/libdspam.so.7(dspam_process+0x2c9)[0xb7fcd757]
dspam(retrain_message+0x197)[0x804e8b5]
dspam(process_message+0x282)[0x80529d4]
dspam(process_users+0xd3a)[0x8054c8c]
dspam(main+0x4a3)[0x8055b16]
/lib/libc.so.6(__libc_start_main+0xe6)[0xb7e7ba66]
dspam[0x804b971]
======= Memory map: ========
08048000-08060000 r-xp 00000000 fd:04 619760     /usr/bin/dspam
08060000-08061000 r--p 00017000 fd:04 619760     /usr/bin/dspam
08061000-08062000 rw-p 00018000 fd:04 619760     /usr/bin/dspam
08062000-080d0000 rw-p 00000000 00:00 0          [heap]
b7a00000-b7a21000 rw-p 00000000 00:00 0
b7a21000-b7b00000 ---p 00000000 00:00 0
b7bd1000-b7beb000 r-xp 00000000 fd:04 4662652   
/usr/lib/gcc/i686-pc-linux-gnu/4.4.0/libgcc_s.so.1
b7beb000-b7bec000 r--p 00019000 fd:04 4662652   
/usr/lib/gcc/i686-pc-linux-gnu/4.4.0/libgcc_s.so.1
b7bec000-b7bed000 rw-p 0001a000 fd:04 4662652   
/usr/lib/gcc/i686-pc-linux-gnu/4.4.0/libgcc_s.so.1
b7bed000-b7c00000 r-xp 00000000 09:02 739        /lib/libz.so.1.2.3
b7c00000-b7c01000 r--p 00012000 09:02 739        /lib/libz.so.1.2.3
b7c01000-b7c02000 rw-p 00013000 09:02 739        /lib/libz.so.1.2.3
b7c02000-b7dd1000 r-xp 00000000 fd:04 13044778  
/usr/lib/mysql/libmysqlclient.so.16.0.0
b7dd1000-b7dd5000 r--p 001ce000 fd:04 13044778  
/usr/lib/mysql/libmysqlclient.so.16.0.0
b7dd5000-b7e1c000 rw-p 001d2000 fd:04 13044778  
/usr/lib/mysql/libmysqlclient.so.16.0.0
b7e1c000-b7e1d000 rw-p 00000000 00:00 0
b7e1d000-b7e2b000 r-xp 00000000 fd:04 30497984  
/usr/lib/dspam/libmysql_drv.so.7.0.0
b7e2b000-b7e2c000 r--p 0000d000 fd:04 30497984  
/usr/lib/dspam/libmysql_drv.so.7.0.0
b7e2c000-b7e2d000 rw-p 0000e000 fd:04 30497984  
/usr/lib/dspam/libmysql_drv.so.7.0.0
b7e2d000-b7e37000 r-xp 00000000 09:02 14910     
/lib/libnss_files-2.10.1.so
b7e37000-b7e38000 r--p 00009000 09:02 14910     
/lib/libnss_files-2.10.1.so
b7e38000-b7e39000 rw-p 0000a000 09:02 14910     
/lib/libnss_files-2.10.1.so
b7e39000-b7e3b000 rw-p 00000000 00:00 0
b7e3b000-b7e3d000 r-xp 00000000 09:02 20856      /lib/libdl-2.10.1.so
b7e3d000-b7e3e000 r--p 00001000 09:02 20856      /lib/libdl-2.10.1.so
b7e3e000-b7e3f000 rw-p 00002000 09:02 20856      /lib/libdl-2.10.1.so
b7e3f000-b7e63000 r-xp 00000000 09:02 14902      /lib/libm-2.10.1.so
b7e63000-b7e64000 r--p 00023000 09:02 14902      /lib/libm-2.10.1.so
b7e64000-b7e65000 rw-p 00024000 09:02 14902      /lib/libm-2.10.1.so
b7e65000-b7fa6000 r-xp 00000000 09:02 20854      /lib/libc-2.10.1.so
b7fa6000-b7fa8000 r--p 00141000 09:02 20854      /lib/libc-2.10.1.so
b7fa8000-b7fa9000 rw-p 00143000 09:02 20854      /lib/libc-2.10.1.so
b7fa9000-b7fac000 rw-p 00000000 00:00 0
b7fac000-b7fc0000 r-xp 00000000 09:02 20858     
/lib/libpthread-2.10.1.so
b7fc0000-b7fc1000 r--p 00014000 09:02 20858     
/lib/libpthread-2.10.1.so
b7fc1000-b7fc2000 rw-p 00015000 09:02 20858     
/lib/libpthread-2.10.1.so
b7fc2000-b7fc4000 rw-p 00000000 00:00 0
b7fc4000-b7fdb000 r-xp 00000000 fd:04 29774232  
/usr/lib/libdspam.so.7.0.0
b7fdb000-b7fdc000 r--p 00016000 fd:04 29774232  
/usr/lib/libdspam.so.7.0.0
b7fdc000-b7fdd000 rw-p 00017000 fd:04 29774232  
/usr/lib/libdspam.so.7.0.0
b7fdd000-b7fde000 rw-p 00000000 00:00 0
b7fe6000-b8002000 r-xp 00000000 09:02 16713      /lib/ld-2.10.1.so
b8002000-b8003000 r--p 0001b000 09:02 16713      /lib/ld-2.10.1.so
b8003000-b8004000 rw-p 0001c000 09:02 16713      /lib/ld-2.10.1.so
bfe2b000-bfe40000 rw-p 00000000 00:00 0          [stack]
ffffe000-fffff000 r-xp 00000000 00:00 0          [vdso]
Aborted
theia vuadmin #
-----

BINGO! Let's talk in a constructive way about that issue. Okay? Message is
tagged as Virus and DSPAM does NOT learn the message (it has delegated the
control to ClamAV and ClamAV said it is an Virus so no need for DSPAM to
learn anything about it). So no tokens are available. But the signature is
written to the database and just holding an uninitialized token.

Assuming now that the classification from ClamAV was wrong and we want to
relearn the message

Scenario 1:
To be able to reclassify the message we would need to instruct DSPAM to
tokenzie the message and add the signature regardless of what ClamAV said.
Without that we are not able to learn anything (since we just have an empty
token).

Scenario 2:
We allow reclassification BUT we just switch tags in log and DON'T do any
training. So it is more or less a visual cosmetics but nothing really
processed.

I personally would go for 2 since we handed out control to ClamAV and if
ClamAV thinks it is a virus then we save us time by not tokenizing the
message and just flag the message as Virus. If now a reclassification
happens then we just capture that the token is empty and protocol the
reclassification but don't do any real training.

What do you think?

btw: That think should have not crashed! I have to crawl the code and
capture and fix that issue anyway.

btw2: Good that you found that problem.


Kind Regards from Switzerland

Stevan Bajc

----------------------------------------------------------------------

Comment By: Carlo Rodrigues (csmr)
Date: 2009-07-24 20:29

Message:
What I found out, by looking at the data field for that signature on 
dspam_signature_data is that it is stored as "\0\0\0\0\0\0\0\0" (eight null
chars). I tried retraining with another signature on that condition, and it
crashed as well. Do you have some cases like this, where the data is 8
nulls? I found 26 cases in my db, but I can't say they were all the result
of a virus message scanned by clamav or not, because I wiped my system.log.
I'll train my db from scratch again and try to find these kind of
cases,I'll retrain them, and I'll post the results.

Cheers,
Carlo Rodrigues

----------------------------------------------------------------------

Comment By: Carlo Rodrigues (csmr)
Date: 2009-07-24 20:12

Message:
If I disable clamav scanning, then (my) dspam treats the original message
as innocent, and retrains it without any problem... 

----------------------------------------------------------------------

Comment By: Stevan Bajic (sbajic)
Date: 2009-07-24 19:51

Message:
Does not crash here :(

I have however latest GIT release and GCC 4.4.0 and glibc 2.10.1:
-----
theia vuadmin # /lib/libc.so.6
GNU C Library stable release version 2.10.1, by Roland McGrath et al.
Copyright (C) 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 4.4.0.
Compiled on a Linux >>2.6.30<< system on 2009-07-07.
Available extensions:
        C stubs add-on version 2.1.2
        crypt add-on version 2.1 by Michael Glad and others
        Gentoo patchset 1
        GNU Libidn by Simon Josefsson
        Native POSIX Threads Library by Ulrich Drepper et al
        BIND-8.2.3-T5B
For bug reporting instructions, please see:
<http://www.gnu.org/software/libc/bugs.html>.
theia vuadmin # gcc -v
Using built-in specs.
Target: i686-pc-linux-gnu
Configured with:
/var/tmp/portage/sys-devel/gcc-4.4.0-r1/work/gcc-4.4.0/configure
--prefix=/usr --bindir=/usr/i686-pc-linux-gnu/gcc-bin/4.4.0
--includedir=/usr/lib/gcc/i686-pc-linux-gnu/4.4.0/include
--datadir=/usr/share/gcc-data/i686-pc-linux-gnu/4.4.0
--mandir=/usr/share/gcc-data/i686-pc-linux-gnu/4.4.0/man
--infodir=/usr/share/gcc-data/i686-pc-linux-gnu/4.4.0/info
--with-gxx-include-dir=/usr/lib/gcc/i686-pc-linux-gnu/4.4.0/include/g++-v4
--host=i686-pc-linux-gnu --build=i686-pc-linux-gnu --disable-altivec
--disable-fixed-point --with-ppl --with-cloog --enable-nls
--without-included-gettext --with-system-zlib --disable-checking
--disable-werror --enable-secureplt --disable-multilib --enable-libmudflap
--disable-libssp --enable-libgomp --enable-cld --disable-libgcj
--with-arch=i686 --enable-languages=c,c++,fortran --enable-shared
--enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu
--with-bugurl=http://bugs.gentoo.org/ --with-pkgversion='Gentoo 4.4.0-r1
p1.2'
Thread model: posix
gcc version 4.4.0 (Gentoo 4.4.0-r1 p1.2)
theia vuadmin # mysql -V
mysql  Ver 14.14 Distrib 5.1.36, for pc-linux-gnu (i686) using readline
5.1
theia vuadmin #
-----

When doing that command on my setup:
-----
theia vuadmin # wget
"https://sourceforge.net/tracker/download.php?group_id=250683&atid=1126467&file_id=33649&aid=2826644";
--2009-07-24 19:41:17-- 
https://sourceforge.net/tracker/download.php?group_id=250683&atid=1126467&file_id=36429&aid=2826644
Resolving sourceforge.net... 216.34.181.60
Connecting to sourceforge.net|216.34.181.60|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location:
http://sourceforge.net/tracker/download.php?group_id=250683&atid=1126467&file_id=336429&aid=282664
[following]
--2009-07-24 19:41:18-- 
http://sourceforge.net/tracker/download.php?group_id=250683&atid=1126467&file_id=33429&aid=2826644
Connecting to sourceforge.net|216.34.181.60|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1203 (1.2K) [application/x-mimearchive]
Saving to:
`download.php?group_id=250683&atid=1126467&file_id=336429&aid=2826644'

100%[===================================================================>]
1,203       --.-K/s   in 0s

2009-07-24 19:41:18 (51.3 MB/s) -
`download.php?group_id=250683&atid=1126467&file_id=336429&aid=2826644' savd
[1203/1203]

theia vuadmin # mv
download.php\?group_id\=250683\&atid\=1126467\&file_id\=336429\&aid\=2826644
teste_virus.ml
theia vuadmin # dspam --process --user ste...@bajic.ch --deliver=summary <
teste_virus.eml
X-DSPAM-Result: ste...@bajic.ch; result="Spam"; class="Spam";
probability=1.0000; confidence=0.70; signature4,4a69f2a0151372646267052
theia vuadmin # dspam --class=innocent --source=error
--signature=4,4a69f2a0151372646267052 --user ste...@bajic.ch
theia vuadmin # tail -n 4
/var/spool/dspam/data/s/t/stev...@bajic.ch/stev...@bajic.ch.log
1248457376      S       Carlo Rodrigues <c...@net4b.pt>
4,4a69f2a0151372646267052       teste de virus (signature)  Tagged 
<47a9ca7a.7080...@net4b.pt>

1248457486      F       <None Specified>        4,4a69f2a0151372646267052 
   <None Specified>  Retrained

theia vuadmin #
-----


BUT I do not use ClamAV inside DSPAM. Maybe that is an problem? Could you
(if possible) try once without having ClamAV active (just disable it in
dspam.conf) inside DSPAM? If that still crashes, then please try once if
you could compile DSPAM without ClamAV and then rerun and see if that sill
crashes your DSPAM?


Kind Regards from Switzerland

Stevan Bajic

----------------------------------------------------------------------

Comment By: Carlo Rodrigues (csmr)
Date: 2009-07-24 19:30

Message:
Hello, Stevan.

I'm including the gdb output, as well as the offending message.
I'm using ClamAV, for the virus scanning.

Yes, it is reproducible. Just do the following:

$ dspam --process --user ad...@net4b.pt --deliver=summary <
teste_virus.eml 
X-DSPAM-Result: ad...@net4b.pt; result="Spam"; class="Virus";
probability=1.0000; confidence=1.00; signature=5,4a69e365308598763220631

$ dspam --class=innocent --source=error
--signature=5,4a69e365308598763220631 --user ad...@net4b.pt
*** glibc detected *** dspam: double free or corruption (out):
0x00007fffe996fc00 ***

Cheers,
Carlo Rodrigues

----------------------------------------------------------------------

Comment By: Stevan Bajic (sbajic)
Date: 2009-07-24 18:02

Message:
Hallo Carlos

A GDB trace would be more useful in this case. Can you make one and attach
it here? Is the error reproducible? If so then could you send me the
message or attach it here?

Kind Regards from Switzerland

Stevan Bajic

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=1126467&aid=2826644&group_id=250683

------------------------------------------------------------------------------
_______________________________________________
Dspam-devel mailing list
Dspam-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-devel

Reply via email to