On Thu, Dec 3, 2015 at 5:57 PM, Craig L Russell <[email protected]> wrote: > >> On Dec 3, 2015, at 2:51 PM, Sam Ruby <[email protected]> wrote: >> >> On Thu, Dec 3, 2015 at 4:44 PM, Craig L Russell >> <[email protected]> wrote: >>> We should reject messages with attachments of single .jpg or .gif files. >>> Documents that need to be stapled are never a single page. >> >> Counter example: >> >> Date: Fri, 27 Mar 2015 18:33:39 +0000 >> name: asf-membership-application.jpg > > I suppose I could always add some more text to the membership application. ;-) >> >> (as this is a public list, I'll omit other identifying details) >> >> In related news, I'm making slow but steady progress on what >> essentially is a rudimentary mbox browser at this point. At the >> moment, the *only* filter it is applying is "does the email have >> attachments?". This code is nearly ready to share. You should be >> able to run it on your own machine -- I'll worry about deploying it on >> whimsy later. The initial fetch and parsing of 8 years of email will >> take about an hour. Subsequent fetches and parsing will only take >> seconds. You will need about 11Gb of disk to store emails. >> >> Initially, we can use it to explore "what if" scenarios for >> heuristics, though I will say that I'm growing increasingly skeptical. >> If we can get discarding spam to down to a single mouse click (with an >> undo operation) and avoid putting spam in the repository at all, that >> might be better than applying filters that may skip over legitimate >> messages. > > Actually, the only thing missing from the current workbench is a (spam) > button that would remember the subject of the commit message, which is > currently “Faxes received”. So a (spam) button and the existing (commit) > button with multiple spams committing with the same commit message would do > the trick.
I'm seeing a few open issues (examples WHIMSY-1, WHIMSY-2, WHIMSY-6, WHIMSY-7). And the biggest problem is that the workbench has only a single maintainer. I'd like to make it so that changes to it are as easy as changes to the icla-lint tool, which includes making it easy for people to run it themselves on their own machine. The approach I'm suggesting will avoid spam ever being added to the repository at all. This will become more clear when you can see the tool for for yourself. >> Once we settle on what the rules are, adding in functions from the >> existing secretary workbench should be straightforward. > > I think we’re down to just a few cases of mails that fail to be received, now > that icla.pdf and icla.pdf.asc are working. The fact that there is no corpus of test data for the existing secmail tool continues to make any changes to that tool a crap shoot. And the root problem here is that we can't predict in advance all of the weird and wonderful ways that people will attempt to either legitimately send us forms or spam us. So the best data we have is 8 years of messages. I'm trying to make it easier for that data to be explored and exploited. If secmail and workbench can be integrated, perhaps online submission of ICLA forms also be a part of the overall package. > Craig >> >> - Sam Ruby - Sam Ruby >> - Sam Ruby >> >>> Craig >>> >>>> Begin forwarded message: >>>> >>>> From: [email protected] >>>> Subject: foundation: r63920 - >>>> /documents/received/Johanna-info-badges-dbn.biz--image003.jpg >>>> Date: December 3, 2015 at 10:03:23 AM PST >>>> To: [email protected] >>>> Reply-To: [email protected] >>>> >>>> Author: rubys >>>> Date: Thu Dec 3 18:03:23 2015 >>>> New Revision: 63920 >>>> >>>> Log: >>>> Delete spam >>>> >>>> Removed: >>>> documents/received/Johanna-info-badges-dbn.biz--image003.jpg >>>> >>> >>> Craig L Russell >>> Architect, Oracle >>> http://db.apache.org/jdo >>> 408 276-5638 mailto:[email protected] >>> P.S. A good JDO? O, Gasp! >>> > > Craig L Russell > Architect, Oracle > http://db.apache.org/jdo > 408 276-5638 mailto:[email protected] > P.S. A good JDO? O, Gasp! >
