http://bugzilla.spamassassin.org/show_bug.cgi?id=3776





------- Additional Comments From [EMAIL PROTECTED]  2004-10-09 08:35 -------
>>  I wonder if this would also apply to other 
>> binary-in-the-body stuff, like old uuencoded 
>> binaries in the body.

Hi Loren, the MIME decoder in SA3 should strip out in-line uuencoded just like
it does on rfc822 attachments.  The problem here is that this message has a
malformed mime structure, causing SA3 and other tools i have tried (Ripmime) to
decode it like it sees it.  

>> "It seems that the entire message body is being 
>> passed in to create_lm, not just the malformed MIME 
>> part that MUAs display in the body."

Sidney,  Isn't that what i said?  I agree your proposed patch is the better way
to go to limit the total size that can be scanned rather than the total number
of iterations on the for loop.

>> "Also, the slowness of that loop in TextCat doesn't 
>> explain the memory blowup."

Sidney, after patching the create_lm call, i dont have any big increases in
memory consumption.   the for loop takes about 6 seconds because its doing 28k
iterations.  the ngrams sort in the else statement takes 4-5 seconds.   the
splice (5->6) takes 10-11 seconds.  the return back to classify takes 4-5
seconds!   its not the for loop that causes the big increases in memory, its the
sorts and splices, and then having to return that big array back to classify().

2004-10-09 10:31:41.246016500 debug: generic: going to textcat matches
2004-10-09 10:31:41.247064500 debug: generic: running TextCat::classify
2004-10-09 10:31:44.999990500 debug: generic: count was 28758
2004-10-09 10:31:45.000113500 debug: generic: 3 else sort ngrams
2004-10-09 10:31:49.693734500 debug: generic: 4 else sort ngrams is done
2004-10-09 10:31:49.693854500 debug: generic: 5 splice sorted
2004-10-09 10:31:59.008898500 debug: generic: 6 splice sorted is done
2004-10-09 10:31:59.009022500 debug: generic: 7 return sorted to classify()
2004-10-09 10:32:03.020713500 debug: generic: done running create_lm

after patching create_lm(), the sorts and splices are very fast...

2004-10-09 10:30:07.046344500 debug: generic: going to textcat matches
2004-10-09 10:30:07.047347500 debug: generic: running TextCat::classify
2004-10-09 10:30:07.413029500 debug: generic: count was 2501
2004-10-09 10:30:07.413138500 debug: generic: 3 else sort ngrams
2004-10-09 10:30:07.568822500 debug: generic: 4 else sort ngrams is done
2004-10-09 10:30:07.568935500 debug: generic: 5 splice sorted
2004-10-09 10:30:07.591153500 debug: generic: 6 splice sorted is done
2004-10-09 10:30:07.591260500 debug: generic: 7 return sorted to classify()
2004-10-09 10:30:07.651056500 debug: generic: done running create_lm

I am taking a weekend vacation with my wife right now, so i'm not sure if i can
continue on this until monday.  I agree the scan time of 4-6 seconds for this
message is still too slow, and we need to figure out what is causing that next
slow down.

thanks. d






------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to