Package: mailscripts
Severity: wishlist
Version: 0.11-1
Tags: patch upstream

Please consider adding imap-dl to mailscripts (see attached patch).

This is a simple downloader that pulls from IMAP-over-TLS and deposits
in a Maildir.  It is in python3, and makes use of only the modern
stdlib.

This allows those of us who use getmail to treat a remote IMAP store as
a POP message store to no longer depend on getmail (which is
python2-only at the time of this writing).

   --dkg

From 0f17fac791cb6b2fd656b85d97c28f432571e750 Mon Sep 17 00:00:00 2001
From: Daniel Kahn Gillmor <d...@fifthhorseman.net>
Date: Sun, 15 Sep 2019 19:55:07 -0400
Subject: [PATCH] Add imap-dl, a simple imap downloader

getmail upstream appears to have no plans to convert to python3 in the
near future.

Some of us use only a minimal subset of features of getmail, and it
would be nice to have something simpler, with the main complexity
offloaded to the modern python3 stdlib.

Signed-off-by: Daniel Kahn Gillmor <d...@fifthhorseman.net>
---
 Makefile      |   1 +
 imap-dl       | 196 ++++++++++++++++++++++++++++++++++++++++++++++++++
 imap-dl.1.pod |  80 +++++++++++++++++++++
 3 files changed, 277 insertions(+)
 create mode 100755 imap-dl
 create mode 100644 imap-dl.1.pod

diff --git a/Makefile b/Makefile
index 352f6f0..860ec27 100644
--- a/Makefile
+++ b/Makefile
@@ -1,5 +1,6 @@
 MANPAGES=mdmv.1 mbox2maildir.1 \
 	notmuch-slurp-debbug.1 notmuch-extract-patch.1 maildir-import-patch.1 \
+	imap-dl.1 \
 	email-extract-openpgp-certs.1 \
 	email-print-mime-structure.1 \
 	notmuch-import-patch.1
diff --git a/imap-dl b/imap-dl
new file mode 100755
index 0000000..97fa4a8
--- /dev/null
+++ b/imap-dl
@@ -0,0 +1,196 @@
+#!/usr/bin/python3
+# -*- coding: utf-8 -*-
+
+# Copyright (C) 2019 Daniel Kahn Gillmor
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or (at
+# your option) any later version.
+#
+# This program is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <https://www.gnu.org/licenses/>.
+
+'''A simple replacement for a minimalist use of getmail.
+
+Usage: 
+
+   imap-dl [-v|--verbose] configfile…
+
+In particular, if you use getmail to reach an IMAP server as though it
+was POP (retrieving from the server and optionally deleting), you can
+point this script to the getmail config and it should do the same
+thing.
+
+It tries to ensure that the configuration file is of the expected
+type, and will terminate raising an exception, and it should not lose
+messages.
+
+If there's any interest in supporting other use cases for getmail,
+patches are welcome.
+
+If you've never used getmail, you can make the simplest possible
+config file like so:
+
+----------
+[retriever]
+server = mail.example.net
+username = foo
+password = sekr1t!
+
+[destination]
+path = /home/foo/Maildir
+
+[options]
+delete = True
+----------
+'''
+
+import configparser
+import sys
+import ssl
+import imaplib
+import re
+import logging
+import mailbox
+import os.path
+
+_summary_splitter = re.compile(rb'^(?P<id>[0-9]+) \(UID (?P<uid>[0-9]+) RFC822.SIZE (?P<size>[0-9]+)\)$')
+def break_fetch_summary(line):
+    '''b'1 (UID 160 RFC822.SIZE 1867)' -> {id: 1, uid: 160, size: 1867}'''
+    match = _summary_splitter.match(line)
+    if not match:
+        raise Exception('malformed summary line %s'%(line))
+    ret = {}
+    for i in ['id', 'uid', 'size']:
+        ret[i] = int(match[i])
+    return ret
+
+_fetch_splitter = re.compile(rb'^(?P<id>[0-9]+) \(UID (?P<uid>[0-9]+) (FLAGS \([\\A-Za-z ]*\) )?BODY\[\] \{(?P<size>[0-9]+)\}$')
+def break_fetch(line):
+    '''b'1 (UID 160 BODY[] {1867}' -> {id: 1, uid: 160, size: 1867}'''
+    match = _fetch_splitter.match(line)
+    if not match:
+        raise Exception('malformed fetch line %s'%(line))
+    ret = {}
+    for i in ['id', 'uid', 'size']:
+        ret[i] = int(match[i])
+    return ret
+
+def pull_msgs(configfile):
+    logging.info('pulling from config file %s'%(configfile,))
+    conf = configparser.ConfigParser()
+    conf.read_file(open(configfile, 'r'))
+    delete = conf.get('options', 'delete', fallback='False')
+    if not isinstance(delete, bool):
+        if delete.lower() in ['true', 'yes', '1']:
+            delete = True
+        elif delete.lower() not in ['false', 'no', '0']:
+            logging.warning('options.delete has unknown value %s, treating as false'%(delete,))
+            delete = False
+    rtype = conf.get('retreiver', 'type', fallback='SimpleIMAPSSLRetriever')
+    if rtype.lower() != 'simpleimapsslretriever':
+        raise Exception('Expected retriever.type to be SimpleIMAPSSLRetriever, got %s'%(rtype,))
+    # FIXME: handle `retriever.record_mailbox`
+    # FIXME: handle leading ~-style globbing for destination.type
+    dtype = conf.get('destination', 'type', fallback='Maildir')
+    if dtype.lower() != 'maildir':
+        raise Exception('expected destination.type to be Maildir, got %s'%(
+            dtype))
+    dst = conf.get('destination', 'path')
+    if not os.path.isdir(dst):
+        raise Exception('expected destination directory, but %s is not a directory'%(dst,))
+    ca_certs = conf.get('retriever', 'ca_certs', fallback=None)
+    ctx = ssl.create_default_context(cafile=ca_certs)
+    mdst = mailbox.Maildir(dst)
+    with imaplib.IMAP4_SSL(host=conf.get('retriever', 'server'),
+                           port=conf.get('retriever', 'port', fallback=993),
+                           ssl_context=ctx) as imap:
+        resp = imap.login(conf.get('retriever', 'username'),
+                          conf.get('retriever', 'password'))
+        if resp[0] != 'OK':
+            raise Exception('login failed with %s as user %s on %s'%(
+                resp,
+                conf.get('retriever', 'username'),
+                conf.get('retriever', 'server')))
+        resp = imap.select()
+        if resp[0] != 'OK':
+            raise Exception('selection failed: %s'%(resp,))
+        if len(resp[1]) != 1:
+            raise Exception('expected exactly one EXISTS response from select, got %s'%(resp[1]))
+        n = int(resp[1][0])
+        if n == 0:
+            logging.info('No messages to retrieve')
+            return
+        resp = imap.fetch('1:%d'%(n), '(UID RFC822.SIZE)')
+        if resp[0] != 'OK':
+            raise Exception('initial FETCH 1:%d not OK (%s)'%(n, resp))
+        pending = list(map(break_fetch_summary, resp[1]))
+        sizes = {}
+        for m in pending:
+            sizes[m['uid']] = m['size']
+        fetched = set()
+        uids = ','.join(map(lambda x: str(x['uid']), sorted(pending, key=lambda x: x['uid'])))
+        totalbytes = sum([x['size'] for x in pending])
+        logging.info('Fetching %d messages, for a total of %d bytes'%(
+            len(pending), totalbytes))
+        # FIXME: sort by size?
+        # FIXME: fetch in batches or singly instead of all-at-once?
+        # FIXME: rolling deletion?
+        # FIXME: asynchronous work?
+        resp = imap.uid('fetch', uids, '(UID BODY[])')
+        if resp[0] != 'OK':
+            raise Exception('UID fetch failed %s'%(resp[0]))
+        for f in resp[1]:
+            # these objects are weirdly structured. i don't know why
+            # these trailing close-parens show up.  so this is very
+            # ad-hoc and nonsense
+            if isinstance(f, bytes):
+                if f != b')':
+                    raise Exception('got bytes object of length %d but expected simple closeparen'%(len(f),))
+            elif isinstance(f, tuple):
+                if len(f) != 2:
+                    raise Exception('expected 2-part tuple, got %d-part'%(
+                        len(f)))
+                m = break_fetch(f[0])
+                if m['size'] != len(f[1]):
+                    raise Exception('expected %d octets, got %d'%(
+                        m['size'], len(f[1])))
+                if m['size'] != sizes[m['uid']]:
+                    raise Exception('summary said %d octets, fetch sent %d'%(
+                        sizes[m['uid']], m['size']))
+                fname = mdst.add(f[1])
+                logging.info('stored message %d/%d (uid %d, %d bytes) in %s'%(
+                    len(fetched) + 1, len(pending), m['uid'], m['size'], fname))
+                del sizes[m['uid']]
+                fetched.add(m['uid'])
+        if sizes:
+            logging.warning('unhandled UIDs: %s'%(sizes))
+        if delete:
+            logging.info('trying to delete %d messages from IMAP store'%(len(fetched)))
+            resp = imap.uid('STORE', ','.join(map(str, fetched)), '+FLAGS', r'(\Deleted)')
+            if resp[0] != 'OK':
+                raise Exception('failed to set \\Deleted flag: %s'%(resp))
+            resp = imap.expunge()
+            if resp[0] != 'OK':
+                raise Exception('failed to expunge! %s'%(resp))
+        else:
+            logging.info('not deleting any messages, since delete=True')
+
+if __name__ == '__main__':
+    args = sys.argv[1:]
+    for varg in ['-v', '--verbose']:
+        while varg in args:
+            logging.getLogger().setLevel(logging.INFO)
+            args.remove(varg)
+
+    if not args:
+        logging.error('no config files supplied, must supply at least one')
+        exit(1)
+    for confname in args:
+        pull_msgs(confname)
diff --git a/imap-dl.1.pod b/imap-dl.1.pod
new file mode 100644
index 0000000..4afdaf0
--- /dev/null
+++ b/imap-dl.1.pod
@@ -0,0 +1,80 @@
+=encoding utf8
+
+=head1 NAME
+
+imap-dl -- a simple replacement for a minimalist user of getmail
+
+=head1 SYNOPSIS
+
+B<imap-dl> [B<-v>|B<--verbose>}] B<configfile>...
+
+=head1 DESCRIPTION
+
+If you use getmail to reach an IMAP server as though it was POP
+(retrieving from the server, storing it in a maildir and optionally
+deleting), you can point this script to the getmail config and it
+should do the same thing.
+
+It tries to ensure that the configuration file is of the expected
+type, and will terminate raising an exception, and it should not lose
+messages.
+
+If there's any interest in supporting other use cases for getmail,
+patches are welcome.
+
+=head1 OPTIONS
+
+B<-v> or B<--verbose> causes B<imap-dl> to print more details
+about what it is doing.
+
+=head1 EXAMPLE CONFIG
+
+If you've never used getmail, you can make the simplest possible
+config file like so:
+
+=over 4
+
+    [retriever]
+    server = mail.example.net
+    username = foo
+    password = sekr1t!
+
+    [destination]
+    path = /home/foo/Maildir
+
+    [options]
+    delete = True
+
+=back
+
+=head1 LIMITATIONS
+
+B<imap-dl> is currently deliberately minimal.  It is designed to be
+used by someone who treats their IMAP mailbox like a POP server.
+
+It works with IMAP-over-TLS only, and it just fetches all messages
+from the default IMAP folder.  It does not support all the various
+features of getmail.
+
+B<imap-dl> is deliberately implemented in a modern version of python3,
+and tries to just use the standard library.  It will not be backported
+to python2.
+
+B<imap-dl> uses imaplib, which means that it does synchronous calls to
+the imap server.  A more clever implementation would use asynchronous
+python to avoid latency/roundtrips.
+
+B<imap-dl> does not know how to wait and listen for new mail using
+IMAP IDLE.  This would be a nice additional feature.
+
+B<imap-dl> does not yet know how to deliver to an MDA (or to
+B<notmuch-insert>).  This would be a nice thing to be able to do.
+
+=head1 SEE ALSO
+
+https://tools.ietf.org/html/rfc3501, http://pyropus.ca/software/getmail/
+
+=head1 AUTHOR
+
+B<imap-dl> and this manpage were written by Daniel Kahn Gillmor,
+inspired by some functionality from the getmail project.
-- 
2.23.0

Attachment: signature.asc
Description: PGP signature

Reply via email to