-From Thomas, posted elsewhere
>Remains the (my) question - what should be done with mails that
>reaches the 'MaxAllowedHamDups' without breaking any concept and without
>creating a new folder (which breaks several concepts)?
The scenario where a bonehead user sends 5000 of the same message in an
Outlook mailmerge isn't just a conceptual possibility, it happens. And
it's happening more and more frequently despite training, memos, reminders,
and a very good email blast system in place that eliminated the need for
mailmerges.
What about when doing the nightly cleanup if you were to delete files with
the same name in excess of max dups, then delete as you already do files in
excess of the maximum total number of files? I thought that was what was
already happening with the spam corpus, but apparently not.
I only see upside to limiting the number of dups it notspam, but you've
stated elsewhere that the arguments herein don't make sense to you. If
you're saying what we suggest doesn't make any sense, I know that we must
be missing something significant. I know that bayesian filtering works
really well, but I only understand the inner workings from 35,000 feet. I
just can't understand how making every effort to insure that our notspam
corpus remains diverse doesn't make sense.
Thanks again. Hope we can continue this discussion.
On Mon, Mar 14, 2016 at 5:28 PM, K Post <nntp.p...@gmail.com> wrote:
> On of our staff inadvertently sent about 3400 of the same test messages
> out through our server. Okay, okay, it was me - had a loop coded wrong and
> before I noticed what was going on and could stop it about 3400 of the same
> messages went out, fortunately, they were just to me. Sure enough, all
> 3400 were in notspam.
>
> So, could we, and does it make sense, to keep discussing this?
>
> On Thu, Mar 10, 2016 at 1:47 PM, K Post <nntp.p...@gmail.com> wrote:
>
>> Isn't that exact same logic an argument for having the maximum number of
>> duplicate subjects apply to the HAM / notspam folder too? 5000 or 15000 of
>> the same message sent individually by (untrainable / apathetic) users would
>> fill the notspam folder and mess up HMM / Bayesian right?
>>
>> And for those RE / FWD / No subject emails, maybe we could have ASSP
>> ignore subjects shorter than say 5 or 6 characters when deleting duplicate
>> file names? Then those files could get wiped out oldest first during the
>> maintenance.
>>
>> \
>>
>> On Thu, Mar 10, 2016 at 11:18 AM, Thomas Eckardt <
>> thomas.ecka...@thockar.com> wrote:
>>
>>> Just think about the logic behind Bayesian and HMM - this will answer
>>> your
>>> question.
>>>
>>> Having the same mail in the spam folder multiple times, this will score
>>> the content to extreme spam havy, even your users are using the same
>>> content - but less often.
>>>
>>> Thomas
>>>
>>>
>>>
>>>
>>>
>>> Von: K Post <nntp.p...@gmail.com>
>>> An: ASSP development mailing list <assp-test@lists.sourceforge.net>
>>> Datum: 10.03.2016 16:58
>>> Betreff: Re: [Assp-test] Max Number Duplicate File Names
>>>
>>>
>>>
>>> I know you're all RTFM, but there's plenty of places in the GUI where the
>>> description isn't exactly clear or right. For example
>>>
>>> MaxFiles
>>> If you're not using subjects as file names ( UseSubjectsAsMaillogNames ),
>>> this is the maximum number of files to keep in each collection (spam &
>>> nonspam)
>>> It's actually less than this -- files get a random number between 1 and
>>> MaxFiles.
>>>
>>> I AM using file names and MaxFiles DOES control the maximum number of
>>> files
>>> in each collection, despite what the description says when
>>> MaintBayesCollection is on and no max age is set. The language is not
>>> clear
>>> and that makes us assume things, sometimes incorrectly, about what the
>>> GUI
>>> really mean. We've been working this way since ASSP came out. Because
>>> of
>>> this, I had no way of knowing that MaxAllowedDups >really< only applied
>>> to
>>> the spam collection. I assumed the GUI meant the whole log of spam and
>>> NOTspam. I don't think that's an unreasonable assumption, or call it an
>>> oversight, or a mistake on my part - but none of that justifies and angry
>>> sounding response from you.
>>>
>>> I'm not looking for a fight, but I feel like I have to keep justifying
>>> myself after you appear to be so angry with me, and the rest of us, who
>>> turn to you for enlightenment. You're carrying the entire weight of this
>>> project on your shoulders. It's a lot, I know, Can we move on and have
>>> a
>>> reasonable discussion here?
>>>
>>> Is there a reason that MaxAllowedDups shouldn't also apply to the notspam
>>> collection? Shouldn't we want that to be the case for the same reason
>>> that we have it for spam? Maybe also to the errors collections?
>>>
>>> If we don't, wouldn't the case where a staff member sends the same basic
>>> message to 5000 people (against my wishes, but I can't control
>>> everything)
>>> that'll take 1/3 of the other notspam messages out of the rebuild
>>> processes? How about if 20k messages are sent?
>>>
>>> Maybe I'm just not understanding, and that's why I'm asking, but I hope
>>> it
>>> doesn't result in any more scolding.
>>>
>>> Thank you
>>>
>>>
>>> On Thu, Mar 10, 2016 at 4:15 AM, Thomas Eckardt
>>> <thomas.ecka...@thockar.com>
>>> wrote:
>>>
>>> > >There are about 600 of those files in NotSpam.
>>> >
>>> > 'MaxAllowedDups','Max Number of Duplicate File Names'
>>> > 'The maximum number of logged files with the same filename (subject)
>>> > that are stored in the spam folder (spamlog),........
>>> >
>>> > I'll write in Hebrew - possibly the english is better, if you translate
>>> it
>>> > back to english.
>>> >
>>> > Thomas
>>> >
>>> >
>>> >
>>> > Von: K Post <nntp.p...@gmail.com>
>>> > An: ASSP development mailing list <assp-test@lists.sourceforge.net
>>> >
>>> > Datum: 10.03.2016 00:29
>>> > Betreff: [Assp-test] Max Number Duplicate File Names
>>> >
>>> >
>>> >
>>> > I've got UseSubjectAsMaillogNames checked (the messages are stored in
>>> the
>>> > folders user the subject name followed by a 6 digit number as expected)
>>> >
>>> > I've got MaxAllowedDups set to 3
>>> >
>>> > MaxBayesFileAge is 0
>>> > MaxFiles is 15000
>>> >
>>> > I'm noticing that MaxAllowedDups doesn't seem to be working.
>>> >
>>> > For example, a couple users often send emails with the subject
>>> > "Your Donation Receipt"
>>> > There are about 600 of those files in NotSpam.
>>> > Your_Donation_Receipt--123456.txt
>>> > where 123456 is a random differing number.
>>> >
>>> > Shouldn't only 3 of these files exist in the folder (with the exception
>>> of
>>> > those that were sent since the rebuild / maintenance window)?
>>> >
>>> > Thanks
>>> >
>>> >
>>>
>>> ------------------------------------------------------------------------------
>>> > Transform Data into Opportunity.
>>> > Accelerate data analysis in your applications with
>>> > Intel Data Analytics Acceleration Library.
>>> > Click to learn more.
>>> > http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
>>> > _______________________________________________
>>> > Assp-test mailing list
>>> > Assp-test@lists.sourceforge.net
>>> > https://lists.sourceforge.net/lists/listinfo/assp-test
>>> >
>>> >
>>> >
>>> >
>>> > DISCLAIMER:
>>> > *******************************************************
>>> > This email and any files transmitted with it may be confidential,
>>> legally
>>> > privileged and protected in law and are intended solely for the use of
>>> the
>>> >
>>> > individual to whom it is addressed.
>>> > This email was multiple times scanned for viruses. There should be no
>>> > known virus in this email!
>>> > *******************************************************
>>> >
>>> >
>>> >
>>> >
>>>
>>> ------------------------------------------------------------------------------
>>> > Transform Data into Opportunity.
>>> > Accelerate data analysis in your applications with
>>> > Intel Data Analytics Acceleration Library.
>>> > Click to learn more.
>>> > http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
>>> > _______________________________________________
>>> > Assp-test mailing list
>>> > Assp-test@lists.sourceforge.net
>>> > https://lists.sourceforge.net/lists/listinfo/assp-test
>>> >
>>> >
>>>
>>> ------------------------------------------------------------------------------
>>> Transform Data into Opportunity.
>>> Accelerate data analysis in your applications with
>>> Intel Data Analytics Acceleration Library.
>>> Click to learn more.
>>> http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
>>> _______________________________________________
>>> Assp-test mailing list
>>> Assp-test@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/assp-test
>>>
>>>
>>>
>>>
>>> DISCLAIMER:
>>> *******************************************************
>>> This email and any files transmitted with it may be confidential, legally
>>> privileged and protected in law and are intended solely for the use of
>>> the
>>>
>>> individual to whom it is addressed.
>>> This email was multiple times scanned for viruses. There should be no
>>> known virus in this email!
>>> *******************************************************
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Transform Data into Opportunity.
>>> Accelerate data analysis in your applications with
>>> Intel Data Analytics Acceleration Library.
>>> Click to learn more.
>>> http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
>>> _______________________________________________
>>> Assp-test mailing list
>>> Assp-test@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/assp-test
>>>
>>>
>>
>
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Assp-test mailing list
Assp-test@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/assp-test