On Tue, Dec 1, 2015 at 9:36 PM, Craig L Russell
<[email protected]> wrote:
> By the way, this is why I would like the filter to exclude -xxxxx.yyy and 
> .zip files:
>
> documents/received] clr% svn up
> Updating '.':
> A    .-Mytypist2.jpg
> A    America-Airlines-orders-pikirangue.com--00000646722.zip

And... that's the problem with heuristics.  A counter example
(received 2015-10-13 10:30:48 +08:00) was "Software Grant and
CCLA.zip".  Or "ICLA Apache Eagle.zip" (received 2015-10-28 19:55:44
Z).  I can give plenty more.

No question that *most* zip files are garbage.

> Craig

- Sam Ruby

>> On Dec 1, 2015, at 10:39 AM, Craig L Russell <[email protected]> 
>> wrote:
>>
>>
>>> On Dec 1, 2015, at 8:57 AM, Sam Ruby <[email protected]> wrote:
>>>
>>> On Tue, Dec 1, 2015 at 11:32 AM, Craig L Russell
>>> <[email protected]> wrote:
>>>> Please review this patch. I don’t know whether I got the syntax right. I 
>>>> think the logic is ok. ;-)
>>>>
>>>> If there were a test bench for this script I would use it…
>>>
>>> I have a number of incomplete rewrites in which I used the corpus of
>>> existing secretary archives as a test data.  I would be happy to
>>> collaborate on such a rewrite.  And to follow up on a previous
>>> comment: if secmail were to do a rsync and then run on whimsy-vm,
>>> there would be no need for the files received to be placed in svn.
>>> They could reside on the vm, and discarded or added by the workbench.
>>
>> The only thing I’d worry about is if the vm had to be restarted. Would the 
>> in-process documents be lost?
>>
>>> The workbench could also show the original email text, which may
>>> contain relevant info.
>>
>> Indeed. Many times the submitter doesn’t fill the “notify PMC” field, even 
>> if the submitter has been voted as a committer. So having quick access to 
>> the email would save time.
>>>
>>>> The idea is to reject file names that begin with ‘-‘ and types that are 
>>>> known bad.
>>>
>>> Did you intend to reject files whose names do not include a dot?
>>
>> Yes. Can you give me a counter example?
>>>
>>> Note: in Python, negative array indexes count from the end, so -1 is
>>> the last element, -2 is the second to the last, etc.  There is no need
>>> for constructs like len(splitname) - 1.
>>
>> That’s what comes from being a py-newbie.
>>
>> I’ll send another patch.
>>
>> Craig
>>
>>>
>>> Not required, but Python also has some handy methods for parsing a
>>> path: https://docs.python.org/2/library/os.path.html
>>>
>>> - Sam Ruby
>>>
>>>
>>>> Index: secmail.py
>>>> ===================================================================
>>>> --- secmail.py  (revision 974086)
>>>> +++ secmail.py  (working copy)
>>>> @@ -180,8 +180,17 @@
>>>>        if len(subpayload.get_payload(decode=True))<10240: continue
>>>>      # if not subpayload.get_payload(decode=True): continue
>>>>
>>>> -      # get_filename doesn't appear to have an endswith method
>>>> -      # if subpayload.get_filename().endswith('.gpg'): continue
>>>> +      # analyze file name and type
>>>> +      filename = subpayload.get_filename()
>>>> +      splitname = filename.split('.')
>>>> +      if len(splitname) < 2: continue
>>>> +      filebase = splitname[len(splitname) - 2]
>>>> +      filetype = splitname[len(splitname) - 1]
>>>> +      if filebase[0] == '-': continue
>>>> +
>>>> +      rejecttypes = ['zip', 'doc', 'docx', 'xls', 'gpg']
>>>> +      if filetype in rejecttypes: continue
>>>> +
>>>>      attachments.append(subpayload)
>>>>
>>>>  if len(attachments) == 0: return
>>>>
>>>>
>>>>
>>>> Craig L Russell
>>>> Architect, Oracle
>>>> http://db.apache.org/jdo
>>>> 408 276-5638 mailto:[email protected]
>>>> P.S. A good JDO? O, Gasp!
>>>>
>>>>
>>
>> Craig L Russell
>> Architect, Oracle
>> http://db.apache.org/jdo
>> 408 276-5638 mailto:[email protected]
>> P.S. A good JDO? O, Gasp!
>>
>
> Craig L Russell
> Architect, Oracle
> http://db.apache.org/jdo
> 408 276-5638 mailto:[email protected]
> P.S. A good JDO? O, Gasp!
>

Reply via email to