On Tue, Dec 1, 2015 at 9:36 PM, Craig L Russell <[email protected]> wrote: > By the way, this is why I would like the filter to exclude -xxxxx.yyy and > .zip files: > > documents/received] clr% svn up > Updating '.': > A .-Mytypist2.jpg > A America-Airlines-orders-pikirangue.com--00000646722.zip
And... that's the problem with heuristics. A counter example (received 2015-10-13 10:30:48 +08:00) was "Software Grant and CCLA.zip". Or "ICLA Apache Eagle.zip" (received 2015-10-28 19:55:44 Z). I can give plenty more. No question that *most* zip files are garbage. > Craig - Sam Ruby >> On Dec 1, 2015, at 10:39 AM, Craig L Russell <[email protected]> >> wrote: >> >> >>> On Dec 1, 2015, at 8:57 AM, Sam Ruby <[email protected]> wrote: >>> >>> On Tue, Dec 1, 2015 at 11:32 AM, Craig L Russell >>> <[email protected]> wrote: >>>> Please review this patch. I don’t know whether I got the syntax right. I >>>> think the logic is ok. ;-) >>>> >>>> If there were a test bench for this script I would use it… >>> >>> I have a number of incomplete rewrites in which I used the corpus of >>> existing secretary archives as a test data. I would be happy to >>> collaborate on such a rewrite. And to follow up on a previous >>> comment: if secmail were to do a rsync and then run on whimsy-vm, >>> there would be no need for the files received to be placed in svn. >>> They could reside on the vm, and discarded or added by the workbench. >> >> The only thing I’d worry about is if the vm had to be restarted. Would the >> in-process documents be lost? >> >>> The workbench could also show the original email text, which may >>> contain relevant info. >> >> Indeed. Many times the submitter doesn’t fill the “notify PMC” field, even >> if the submitter has been voted as a committer. So having quick access to >> the email would save time. >>> >>>> The idea is to reject file names that begin with ‘-‘ and types that are >>>> known bad. >>> >>> Did you intend to reject files whose names do not include a dot? >> >> Yes. Can you give me a counter example? >>> >>> Note: in Python, negative array indexes count from the end, so -1 is >>> the last element, -2 is the second to the last, etc. There is no need >>> for constructs like len(splitname) - 1. >> >> That’s what comes from being a py-newbie. >> >> I’ll send another patch. >> >> Craig >> >>> >>> Not required, but Python also has some handy methods for parsing a >>> path: https://docs.python.org/2/library/os.path.html >>> >>> - Sam Ruby >>> >>> >>>> Index: secmail.py >>>> =================================================================== >>>> --- secmail.py (revision 974086) >>>> +++ secmail.py (working copy) >>>> @@ -180,8 +180,17 @@ >>>> if len(subpayload.get_payload(decode=True))<10240: continue >>>> # if not subpayload.get_payload(decode=True): continue >>>> >>>> - # get_filename doesn't appear to have an endswith method >>>> - # if subpayload.get_filename().endswith('.gpg'): continue >>>> + # analyze file name and type >>>> + filename = subpayload.get_filename() >>>> + splitname = filename.split('.') >>>> + if len(splitname) < 2: continue >>>> + filebase = splitname[len(splitname) - 2] >>>> + filetype = splitname[len(splitname) - 1] >>>> + if filebase[0] == '-': continue >>>> + >>>> + rejecttypes = ['zip', 'doc', 'docx', 'xls', 'gpg'] >>>> + if filetype in rejecttypes: continue >>>> + >>>> attachments.append(subpayload) >>>> >>>> if len(attachments) == 0: return >>>> >>>> >>>> >>>> Craig L Russell >>>> Architect, Oracle >>>> http://db.apache.org/jdo >>>> 408 276-5638 mailto:[email protected] >>>> P.S. A good JDO? O, Gasp! >>>> >>>> >> >> Craig L Russell >> Architect, Oracle >> http://db.apache.org/jdo >> 408 276-5638 mailto:[email protected] >> P.S. A good JDO? O, Gasp! >> > > Craig L Russell > Architect, Oracle > http://db.apache.org/jdo > 408 276-5638 mailto:[email protected] > P.S. A good JDO? O, Gasp! >
