Bug Tracker item #2989984, was opened at 2010-04-20 18:38
Message generated for change (Settings changed) made by sbajic
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=1126467&aid=2989984&group_id=250683

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
>Category: None
Group: v3.9.0
>Status: Closed
>Resolution: Wont Fix
Priority: 5
Private: No
Submitted By: Olivier - interfaSys (interfasysuk)
Assigned to: Stevan Bajic (sbajic)
Summary: maintenance script: command run twice in a row?

Initial Comment:
The script generates this statement twice in a row:
SET timestamp=1271780703;
SELECT token,spam_hits,innocent_hits,unix_timestamp(last_hit) FROM 
dspam_token_data WHERE uid=14;

Since it's quite a costly operation, wouldn't it be possible to only call it 
once?
This must be from dspam_clean.

I'm calling it like this:
 /usr/local/bin/dspam_maintenance.sh --logdays=30 --signatures=30 --unused=60 
--with-sql-optimization --verbose

and the result is
Enabled drivers are: mysql_drv
Running dspam_logrotate in the background
Active driver is: mysql_drv
Running dspam_clean ...
  * without purging old signatures


----------------------------------------------------------------------

>Comment By: Stevan Bajic (sbajic)
Date: 2010-05-13 17:50

Message:
Hello Oliver,

it has nothing to do with your configuration. It is the way how
dspam_clean works. Basically dspam_clean does 3 things:
1) processing signatures
2) processing probabilities
3) processing unused tokens

(1) operates on signatures while (2) and (3) operate both on tokens. So
(2) and (3) iterate over all user tokens. When (2) and (3) start their
processing they get a list of all tokens the user has. And this is where
you see the exact same query in MySQL twice (maybe not with the same result
but still the query gets executed twice).

It is possible to change that behavior and avoid the second query only
when doing (2) did not result in any deletion of tokens. To get that
functionality, each storage driver would need to be modified to allow
resetting the cursor/position of the user token list and dspam_clean would
as well be needed to be modified to issue a reset after (2) and the code
that does the probability processing would as well be needed to be changed
to return if it has deleted any tokens or not.

Right now I don't think that I am going to optimize that little
dspam_clean utility to have that kind of functionality. I absolutely
understand your point that avoiding this exact same query is something good
but I am very short on time and dspam_clean is not a tool that gets that
much executed. And even when some one would execute it all the time it will
only run that query twice if one is purging probabilities (aka: -p) AND
unused tokens (aka: -u) at the same time. And executing dspam_clean is not
that time sensitive. I mean some one executing dspam_clean is prepared for
a longer runtime.

Right now I would suggest you to change your dspam.conf and set
PurgeSignature, PurgeUnused, PurgeHapaxes, PurgeHits1S and PurgeHits1I to
"off" and get at least v1.19 of the maintenance script. Changing those
values in dspam.conf and using a newer version of the maintenance script
will call dspam_clean only for purging neutral tokens and therefore
avoiding calling (1) and (3) but only calling (2). So no double queries
while using the maintenance script.

I will change now this bug to "Wont Fix". If you still want to see that
issue fixed in dspam_clean then I would suggest you to either open a
feature request or even better: submit code that fixes this issue in all
the storage drivers and in dspam_clean.

I leave this bug report open for comments. Feel free to post back if you
want.


Kind Regards from Switzerland,

Stevan

----------------------------------------------------------------------

Comment By: Olivier - interfaSys (interfasysuk)
Date: 2010-05-12 12:24

Message:
I'm surprised that nobody else mentioned it, so it may have something to do
with my config, but I really can't see what.
Keep me posted :)
Olivier

----------------------------------------------------------------------

Comment By: Stevan Bajic (sbajic)
Date: 2010-05-12 12:21

Message:
Okay. So probably an generic issue with dspam_clean. I need to look at it.
However... I think the issue will not be easy to handle because of the
multi-threaded nature of DSPAM. I will anyway try to fix that issue.

----------------------------------------------------------------------

Comment By: Olivier - interfaSys (interfasysuk)
Date: 2010-05-12 12:19

Message:
Stevan,
OK, I saw a releng 3.9.1, that's why I mentioned it ;)

I just installed the latest git version and ran the script. Same
"problem".
This is the output of the script.
Enabled drivers are: mysql_drv
Running dspam_logrotate in the background
Active driver is: mysql_drv
Running MySQL storage driver data cleanup
Running dspam_clean ...
  * with neutral token purging only

The SQL slow log looks the same.

----------------------------------------------------------------------

Comment By: Stevan Bajic (sbajic)
Date: 2010-05-12 11:28

Message:
Hello Oliver, there is no 3.9.1. You should use trunk.

----------------------------------------------------------------------

Comment By: Olivier - interfaSys (interfasysuk)
Date: 2010-05-12 11:19

Message:
Stevan,
OK, I'll do it.
Should I use 3.9.1 or trunk?
Olivier

----------------------------------------------------------------------

Comment By: Stevan Bajic (sbajic)
Date: 2010-05-12 11:06

Message:
Hello Oliver,

do you have the possibility to test the GIT version of DSPAM? Does it has
the same issue?

I am currently at work and can not test. After work however I am going to
test quickly if this issue is still present in the latest code and if that
issue is happening on every user or just a subset of users. Would however
appreciate if you could test yourself as well with the latest GIT code.

Kind Regards from Switzerland,

Stevan

----------------------------------------------------------------------

Comment By: Olivier - interfaSys (interfasysuk)
Date: 2010-05-12 08:36

Message:
Hello Stevan,
I thought that was  what you meant,but then had a doubt :D

I'm using the official release of V3.9
Olivier

----------------------------------------------------------------------

Comment By: Stevan Bajic (sbajic)
Date: 2010-05-11 22:37

Message:
Hallo Oliver, the script is not the issue. It's dspam_clean that is
responsible for the equal query. So the question is not if you are using
the GIT version of the script or not. The question I had is if you have the
GIT version of DSPAM. So which version of DSPAM are you running?

----------------------------------------------------------------------

Comment By: Olivier - interfaSys (interfasysuk)
Date: 2010-05-11 19:32

Message:
I thought it was executed twice only for this user, especially since it
takes 5 seconds to execute and since there are 5 seconds in between each
run, but I'm only logging slow queries, so it could be that this is the
only query that is slow enough to be logged.

I'm using the git version of the script. V1.18. On FreeBSD 8.

----------------------------------------------------------------------

Comment By: Stevan Bajic (sbajic)
Date: 2010-05-11 13:18

Message:
Is that double query only executed twice for the uid 14 or is this
happening for every user on your system?

Is this double query happening on DSPAM v3.9 or are you using GIT? If you
are using GIT: is this a recent GIT version or an older version? When did
you checked out GIT?

----------------------------------------------------------------------

Comment By: Olivier - interfaSys (interfasysuk)
Date: 2010-05-11 08:12

Message:
I have enabled slow queries logging and here is what I see:
# Time: 100511  2:15:07
# u...@host: dspam[dspam] @ localhost []
# Query_time: 3.423395  Lock_time: 0.000083 Rows_sent: 0  Rows_examined:
2833062
SET timestamp=1273536907;
DELETE LOW_PRIORITY QUICK
  FROM t USING dspam_token_data t
    LEFT JOIN dspam_preferences p ON (p.preference = 'trainingMode' AND
p.uid = t.uid)
    LEFT JOIN dspam_preferences d ON (d.preference = 'trainingMode' AND
d.uid = 0)
  WHERE COALESCE(CONVERT(p.value USING latin1) COLLATE
latin1_general_ci,CONVERT(d.value USING latin1) COLLATE
latin1_general_ci,CONVERT(@TrainingMode USING latin1) COLLATE
latin1_general_ci) NOT IN (_latin1 'TOE',_latin1 'TUM',_latin1 'NOTRAIN')
    AND from_days(@tod...@purgeunused) > last_hit;
# Time: 100511  2:15:43
# u...@host: dspam[dspam] @ localhost []
# Query_time: 4.951504  Lock_time: 0.000028 Rows_sent: 1056176 
Rows_examined: 1056176
SET timestamp=1273536943;
SELECT token,spam_hits,innocent_hits,unix_timestamp(last_hit) FROM
dspam_token_data WHERE uid=14;
# Time: 100511  2:15:49
# u...@host: dspam[dspam] @ localhost []
# Query_time: 5.543141  Lock_time: 0.000029 Rows_sent: 1056176 
Rows_examined: 1056176
SET timestamp=1273536949;
SELECT token,spam_hits,innocent_hits,unix_timestamp(last_hit) FROM
dspam_token_data WHERE uid=14;

As you can see, two similar commands are run one after the other and only
affect the globaluser.

----------------------------------------------------------------------

Comment By: Stevan Bajic (sbajic)
Date: 2010-04-22 01:10

Message:
Hallo Oliver,

what do you mean with "twice in a row"? Could you post logs showing all
commands issued by the maintenance script?

What have you done to see that the select was issued twice? Have you
turned on query logging in MySQL? Could you post this log or at least a
bunch of SQL queries that have been executed to illustrate that the query
is executed twice?

If you have MySQL query log active then try to call manually dspam_clean
with the appropriate switches and then look again if you see twice this
select clause and let me know if the query gets executed twice.

Stevan

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=1126467&aid=2989984&group_id=250683

------------------------------------------------------------------------------

_______________________________________________
Dspam-devel mailing list
Dspam-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspam-devel

Reply via email to