current iteration of Python curses mail client

Kragen Javier Sitaker Sat, 04 Nov 2006 00:37:08 -0800

You still probably don't want to use it, since it doesn't have any
internal functions for sending or replying to email (leaving that to
Emacs), it only supports local mbox files, and it assumes that
existing messages in your mailbox never change.


I've cleaned it up a fair bit; the code should be more comprehensible
and less disgusting now.  Also:
- it supports internationalization in headers and bodies (well, first
  parts)
- slightly improved visual appearance
- many operations, especially common ones, are dramatically faster
- it can scroll backwards in messages now
- it can jump forward and backward by subject
- it no longer crashes if you hit ^C at the wrong time
- new "count matches" command

The next obvious thing to do is to make all the searching stuff use an
on-disk inverted index (possibly just for headers at first) and rip
out all this query caching stuff.  This would make most operations
fast instead of merely less slow, or slow less often.  But that's
hard, so I've been avoiding it.

Even with all its defects, it's good enough that I've been using it in
preference to less(1) full-time over nearly the last month.

The detailed darcs changelog is below.  Note that there was one hour
in which I committed 9 separate patches.  This is much easier to do
with darcs than with CVS or Subversion, especially when you're not
connected to the internet.

Wed Oct 11 19:31:16 VET 2006  [EMAIL PROTECTED]
  * tolerate curses croaking on header lines

Wed Oct 11 12:47:13 VET 2006  [EMAIL PROTECTED]
  * added primitive 'reply' function

Tue Oct 10 13:44:17 VET 2006  [EMAIL PROTECTED]
  * renamed MessageProxy to EmailMessage

Tue Oct 10 13:41:42 VET 2006  [EMAIL PROTECTED]
  * some comment updates

Tue Oct 10 13:40:59 VET 2006  [EMAIL PROTECTED]
  * factored out scrolling from mainloop, added backward scrolling

Mon Oct  9 17:50:02 VET 2006  [EMAIL PROTECTED]
  * refactored much of mainloop into a key_table

Mon Oct  9 17:39:49 VET 2006  [EMAIL PROTECTED]
  * added forward and backward by subject

Mon Oct  9 17:39:19 VET 2006  [EMAIL PROTECTED]
  * removed stdscr and mboxlist local variables in mainloop

Mon Oct  9 17:38:47 VET 2006  [EMAIL PROTECTED]
  * refactored summary paging slightly

Mon Oct  9 17:37:50 VET 2006  [EMAIL PROTECTED]
  * factored go_to_end and count_messages out of mainloop

Mon Oct  9 17:37:10 VET 2006  [EMAIL PROTECTED]
  * made more_detail, less_detail reset to top of message

Mon Oct  9 17:35:27 VET 2006  [EMAIL PROTECTED]
  * made search editing slightly easier

Mon Oct  9 17:34:58 VET 2006  [EMAIL PROTECTED]
  * fix minor performance bug introduced in query refactoring

Mon Oct  9 17:34:47 VET 2006  [EMAIL PROTECTED]
  * refactored queries into a bunch of SearchTerm objects
  
  The code is somewhat more readable, but definitely more verbose --- added
  nearly 40 lines of code, net.
  
  The idea is to better support things like smarter query caching (e.g. don't
  discard cache on tag change if the query doesn't depend on tags), smarter
  highlighting, quicker evaluation strategies, union queries, using indexes, 
etc.
  There are a bunch of small behavior changes from this "refactoring":
  - accidental occurrences of things like 't:kragen' in the email are no longer
    highlighted (nor is time spent searching for them)
  - now 't:' and 'l:' search in the decoded versions of their headers as well as
    the standard versions
  - now you can say "--foo", which means the same thing as "foo".

Sun Oct  8 21:22:54 VET 2006  [EMAIL PROTECTED]
  * moved detail_level into summary_move_page

Sun Oct  8 21:22:24 VET 2006  [EMAIL PROTECTED]
  * made background of message summary screen not bold
  
  Even non-bold text was appearing as bold as a result.

Sun Oct  8 21:22:10 VET 2006  [EMAIL PROTECTED]
  * moved toggle_spam_tag into MessageBrowser

Sun Oct  8 21:19:57 VET 2006  [EMAIL PROTECTED]
  * keep caching non-tag search results even if tags change

Sun Oct  8 21:08:30 VET 2006  [EMAIL PROTECTED]
  * fixed small performance bug in query caching
  
  There was a problem that when searching backwards, if the search was
  unsuccessful and left you at the first mesage, the backward_cache would get
  updated, but the forward_cache wouldn't.

Sun Oct  8 03:21:06 VET 2006  [EMAIL PROTECTED]
  * fixed summary page moves to use new detail_level

Sun Oct  8 03:20:47 VET 2006  [EMAIL PROTECTED]
  * made background in overview view blue to diminish flashing

Sun Oct  8 03:20:08 VET 2006  [EMAIL PROTECTED]
  * moved flash_msg into MailBrowser

Sun Oct  8 00:31:15 VET 2006  [EMAIL PROTECTED]
  * replaced view_source and viewing_summary with a detail_level

Sun Oct  8 00:30:16 VET 2006  [EMAIL PROTECTED]
  * fixed a bug that corrupted the tags file

Sun Oct  8 00:05:00 VET 2006  [EMAIL PROTECTED]
  * trivial reformatting

Sun Oct  8 00:04:19 VET 2006  [EMAIL PROTECTED]
  * factored out change_search() and summary_move_page() from mainloop

Sat Oct  7 23:51:18 VET 2006  [EMAIL PROTECTED]
  * prevent unnecessary disk writes of cached metadata

Sat Oct  7 23:50:30 VET 2006  [EMAIL PROTECTED]
  * updated comments and reformatted to 80 columns

Sat Oct  7 23:42:10 VET 2006  [EMAIL PROTECTED]
  * moved tag_message into MailBrowser, factored out debug_dump()

Sat Oct  7 23:16:01 VET 2006  [EMAIL PROTECTED]
  * moved display_message_summary into MailBrowser

Sat Oct  7 23:06:16 VET 2006  [EMAIL PROTECTED]
  * refactored duplication out of go_forward and go_backward

Sat Oct  7 22:54:46 VET 2006  [EMAIL PROTECTED]
  * added backward search, plus removed some duplication

Sat Oct  7 22:54:11 VET 2006  [EMAIL PROTECTED]
  * removed obsolete Perlis comment

Sat Oct  7 22:53:33 VET 2006  [EMAIL PROTECTED]
  * dramatically sped up going to the end of the file
  
  Sadly it still wants to rewrite the summary even if it's unchanged.

Sat Oct  7 22:36:57 VET 2006  [EMAIL PROTECTED]
  * fixed rather serious performance bug
  
  Previously if a header was requested that didn't exist in the message, even
  though the summary told us that, we would fetch the message off the disk
  anyway.  This was a big problem for e.g. mailing list membership queries.

Sat Oct  7 22:19:23 VET 2006  [EMAIL PROTECTED]
  * split Query into Query and View

Sat Oct  7 21:48:36 VET 2006  [EMAIL PROTECTED]
  * cached query results and simplified query interface
  
  Now things display quickly.

Sat Oct  7 21:22:45 VET 2006  [EMAIL PROTECTED]
  * queries access the tag store as an instance variable now
  
  Previously they accessed them as parameters.
  

Sat Oct  7 21:11:52 VET 2006  [EMAIL PROTECTED]
  * simplified interface to MessageListFacade by removing fp

Sat Oct  7 21:08:36 VET 2006  [EMAIL PROTECTED]
  * moved cachefilename into MessageListFacade class

Sat Oct  7 20:58:18 VET 2006  [EMAIL PROTECTED]
  * moved tags, query, and redraw into MailBrowser
  
  Now, of redraw()'s parameters, only view_source remains as a local variable.

Sat Oct  7 20:45:09 VET 2006  [EMAIL PROTECTED]
  * finished moving query stuff into new Query object
  
  Now the only time anything outside of Query touches search_terms is when the
  user is editing them.

Sat Oct  7 20:40:54 VET 2006  [EMAIL PROTECTED]
  * continued moving search term handling into new Query object

Sat Oct  7 20:32:38 VET 2006  [EMAIL PROTECTED]
  * began moving search term handling into new Query object

Sat Oct  7 20:20:27 VET 2006  [EMAIL PROTECTED]
  * moved mboxlist, current_message, and line_offset into instance variables
  
  Also eliminated local variables mbox and mboxobj.
  
  At last we begin to eliminate some code duplication!

Sat Oct  7 20:08:57 VET 2006  [EMAIL PROTECTED]
  * turned local current_message into instance variable

Sat Oct  7 20:04:45 VET 2006  [EMAIL PROTECTED]
  * removed message_index local variable, made it an instance variable

Sat Oct  7 20:04:29 VET 2006  [EMAIL PROTECTED]
  * fixed bug with malformed header handling

Sat Oct  7 20:04:05 VET 2006  [EMAIL PROTECTED]
  * moved list-headers comments block to more relevant place

Sat Oct  7 19:47:50 VET 2006  [EMAIL PROTECTED]
  * moved realmain() into a new MailBrowser object

Sat Oct  7 19:41:27 VET 2006  [EMAIL PROTECTED]
  * recording failed attempt to marshal

Sat Oct  7 19:18:30 VET 2006  [EMAIL PROTECTED]
  * sped up scrolling dramatically in large message
  
  Added more code duplication to realmain().  But now the message object itself
  has its lines, so we don't have to resplit it for each screen update.

Sat Oct  7 19:13:29 VET 2006  [EMAIL PROTECTED]
  * added charset support for headers and bodies

Sat Oct  7 19:12:45 VET 2006  [EMAIL PROTECTED]
  * added 'count matches' command

Sat Oct  7 19:12:21 VET 2006  [EMAIL PROTECTED]
  * fixed tiny header parsing bug that showed up in some spam

#!/usr/bin/python
import curses, time, cgitb, sys, email, mailbox, re, os, curses.textpad
import cPickle, quopri, base64, email.Utils

# Embarrassingly ugly and fairly minimal mail reader.
# TODO:
# - CLEAN UP MESSY AND DUPLICATED CODE
# D parse search terms only once, into search term objects.
# D go to previous message
# - display headers (well, minimally happening now)
#   D in default display, display only person's name, not email address (or
#     vice versa)
#   D make subject not wrap onto subsequent lines
#   D provide a command for full message headers display
# D speed up index display!
# D scroll around message
# - take other actions such as bounce, approve, reply, or tag
#   D got a primitive 'tag' function
#   D add some amount of filtering
# - refactor:
#   - don't use email.Message for parsing?  (usually don't now)
#   D remove duplication among many redraw() calls
# - make it possible to display multiple messages
# D handle IndexError without crashing
# - handle other exceptions without crashing
# D cache message summary data on disk
#   - still not good enough and not an unqualified win
# - adjust to screen resizing
#   - probably not possible without modifying Python curses binding to
#     support resizeterm(3NCURSES).  Maybe use ctypes?
#     - how does Urwid do it?  (at first glance, it looks like it looks for 410 
KEY_RESIZE)
# - display multiple messages at once
# - interface with mailman code
# - search
#   - ok, faster search, using an index!
# D make it lazier about loading messages
# D support fielded searches (to:kragen s:[silk]) for better speed and accuracy
#   X done in a really crappy way
# D just use file.read() for .as_string()
# - implement command-line history for searches
# D how about page-up and page-down (KEY_NPAGE, KEY_PPAGE) to display next
#   or previous summary page?  Searches are now fast enough that's
#   worthwhile...
# - how about message history? ('go to last-seen message')
# - make stuff more concurrent so as to prefetch search results and message
#   metadata
# - handle ^C while waiting for keystroke sanely?  May require
#   becoming more event-driven, since it's possible to ^C the "while
#   1:" and things like that.
# - display menu letters in different color rather than in []
# - support initial backward searches ([EMAIL PROTECTED] RET)
# - maybe transcode for display charsets like ISO-8859-1?
#   - both in contents and in =?ISO-8859-1?Q?=BFno=3F?=
# - come up with a way to go to the end of the mailbox quickly!

# Mail parsing performance on my 1.1GHz laptop is on the order of 2
# megabytes and 200 messages per second.  For my current email, it
# uses 462 bytes of virtual memory per message. It used to use only 19
# bytes per message, but I wanted to be able to do some kinds of
# searches quickly, without reading and reparsing the entire mailbox
# again.

# So, once it's fully parsed my nearly-1-gig mailbox, it needs less
# than 50MB of virtual memory, which is good.  However, it needed
# something like 7 minutes of CPU time to do that, which is still too
# slow --- if it started out displaying the last message instead of
# the first, I'd be happy.

# OK, now I pickle the current state when the user hits '>' (pickling
# takes 4-5 seconds) and unpickle it at startup (another maybe 5
# seconds).  The pickled file is roughly half the size of the virtual
# memory footprint (18 MB in my case), so it's not a big deal.  The
# 5-second startup is still a big deal (to me), as is the potential
# for fragility.

def cargo_cult_routine(win):
    win.clear()
    win.refresh()
    curses.nl()
    curses.noecho()

# I was using UnixMailbox, but it broke on squeak-dev archives, which look
# like this:
# From johnmci at smalltalkconsulting.com  Sat May  1 00:52:54 2004
# so now I use PortableUnixMailbox instead.
class SeekableUnixMailbox(mailbox.PortableUnixMailbox):
    def tell(self): return self.seekp
    def seek(self, pointer): self.seekp = pointer

def fastparse(fp):
    wsp = re.compile(r'\s+')
    hdr = re.compile(r'([^\s:]+):\s*(.*)')
    curhdr = None
    rv = {}
    while 1:
        line = fp.readline()
        while line.endswith('\n') or line.endswith('\r'): line = line[:-1]
        h = hdr.match(line)
        if h:
            curhdr = h.group(1).lower()
            rv[curhdr] = h.group(2)
        elif wsp.match(line): rv[curhdr] += '\n' + line
        elif not line:
            return rv
        else:
            pass # probably the From line

def mintern(obj):
    try: return intern(obj)
    except TypeError: return obj

def joinlines(datum):
    return re.compile(r'\n\s+').sub(' ', datum)

qre = re.compile('''(?ix)=
    \?(?P<charset>[^?]*)
    \?(?P<encoding>[bq])
    \?(?P<content>.*?)
    \?=''')
def decode_header_for_display(headerstring):
    def decode_chunk(mo):
        try:
            if mo.group('encoding').lower() == 'q': encoding_scheme = quopri
            else: encoding_scheme = base64
            orig_string = encoding_scheme.decodestring(mo.group('content'))
            return unicode(orig_string, mo.group('charset')).encode('utf-8')
        except (LookupError, UnicodeDecodeError, base64.binascii.Error):
            return mo.group(0)
    return qre.sub(decode_chunk, headerstring)

class msglines:
    def __init__(self, body): self.lines = body.split('\n')
    def __getitem__(self, ii):
        if ii < len(self.lines): return self.lines[ii]
        else: return ''
    def __iter__(self): return iter(self.lines)

# Identifying headers produced by various mailing list managers:
#               Mailman  Listserv  ezmlm  Yahoo Groups  Google Groups  Majordomo
# Sender        X        X         -      X             X              X
# Mailing-List  -        -         X      X             X              -
# List-Id       X        -         -      X             X              -
# List-Post     X        -         X      -             X              -

# So if we had to pick just one header to make quickly available for
# mailing-list filtering, it would be Sender, because Listserv and
# Majordomo only support Sender.  But Sender doesn't support ezmlm,
# and usually doesn't contain the actual list name; some examples:
# Sender                                            List address                
                 Software       
# [EMAIL PROTECTED]       [EMAIL PROTECTED]                   Mailman        
# [EMAIL PROTECTED]                       beowulf@beowulf.org                   
       Mailman        
# [EMAIL PROTECTED]                  [EMAIL PROTECTED]                   
Majordomo      
# [EMAIL PROTECTED]               [EMAIL PROTECTED]          Yahoo Groups   
# Vanagon Mailing List <[EMAIL PROTECTED]>  [EMAIL PROTECTED]                   
 Listserv       
# [EMAIL PROTECTED]       [EMAIL PROTECTED]  Google Groups  

# Like Sender, Mailing-List usually doesn't contain the actual list
# address.  The others (List-Id and List-Post) usually do, so I'm
# going to use List-Post.

class EmailMessage:
    # 41988K 6:10
    # keys = 'from subject message-id date'.split()

    # Sender and List-Post allow identification of most mailing lists.
    # 48784K 5:40
    # keys = 'from subject message-id date sender list-post'.split()
    # 56264K 6:32 without interning; 44116K 7:05 with interning
    keys = 'from subject message-id date sender list-post to cc'.split()
    def __init__(self, fileobj):
        self.fileobj = fileobj
        self._fastparse = None
        self._msg = None
        self.cached_metadata = {}
        self._as_string_lines = None
        self._utf8_body_lines = None
    def __repr__(self): return '<EmailMessage %r>' % (self.__dict__,)
    def msg(self):
        if self._msg is not None: return self._msg
        self.fileobj.seek(0)
        self._msg = email.message_from_file(self.fileobj)
        return self._msg
    def fastparse(self):
        # This speeds up e.g. searching for tags in a previously
        # unread part of the mailbox by about a factor of pi:
        if self._fastparse is None:
            self.fileobj.seek(0)
            self._fastparse = fastparse(self.fileobj)
        return self._fastparse
    def __getitem__(self, key):
        if key in self.cached_metadata: return self.cached_metadata[key]
        # For efficiency, crash the program and make the programmer
        # think about the time/space tradeoffs, and fix it, instead of
        # running slowly.
        elif key not in self.keys: raise KeyError, key
        return mintern(self.fastparse()[key])
    def get(self, key, default=None):
        if key in self.cached_metadata:
            rv = self.cached_metadata[key]
            if rv is None: return default
            else: return rv
        elif key not in self.keys: raise KeyError, key
        return mintern(self.fastparse().get(key, default))
    def get_readable(self, key, default):
        """Return a readable string representation of the header contents."""
        return decode_header_for_display(joinlines(self.get(key, default)))
    def get_slow(self, key, default):
        return self.fastparse().get(key, default)        
    def as_string(self):
        # This is at least 10x faster than self.msg().as_string():
        self.fileobj.seek(0)
        return self.fileobj.read()
    def utf8_body(self):
        # This is the only operation that routinely still reads from
        # the file, and the only operation that uses the slow
        # email.Message parser instead of fastparse.
        # Whoever was designing the email.Message API was smoking
        # crack.  get_payload returns either a string, or a list,
        # except that it might return None, if it would have returned
        # a list except that you specified decode=True.  So we end up
        # getting the same payload twice.
        msgobj = self.msg()
        while 1:
            discriminator = msgobj.get_payload()
            if not isinstance(discriminator, type([])):
                payload = msgobj.get_payload(decode=True)
                charset = msgobj.get_content_charset() or 'utf-8'
                if charset == 'us-ascii': charset = 'utf-8'  # safer & compat.
                if charset == 'utf-8': return payload
                try: return unicode(payload, charset).encode('utf-8')
                except (LookupError, UnicodeDecodeError): return payload
            else:
                msgobj = msgobj.get_payload(0)
    def utf8_body_lines(self):
        if not self._utf8_body_lines:
            self._utf8_body_lines = msglines(self.utf8_body())
        return self._utf8_body_lines
    def as_string_lines(self):
        if not self._as_string_lines:
            self._as_string_lines = msglines(self.as_string())
        return self._as_string_lines
    # hmm, maybe this should be a different kind of object, one with
    # the cached metadata:
    def cached_metadata_is(self, values):
        for key, value in zip(self.keys, values):
            self.cached_metadata[key] = value

def reply_skeleton(msg, me):
    hdrs = ['From: %s\n' % me,
            'MIME-Version: 1.0\n',
            'Content-Type: text/plain; charset=utf-8\n',
            'Content-Transfer-Encoding: 8bit\n']

    reply_to = msg.get_slow('reply-to', None) or msg.get('from', None)
    hdrs.append('To: %s\n' % reply_to)

    to_addr = msg.get('to', '')
    cc_addr = msg.get('cc', '')
    if to_addr and cc_addr: dont_cc = to_addr + ', ' + cc_addr
    else: dont_cc = to_addr or cc_addr
    hdrs.append('Dont-Cc: %s\n' % dont_cc)

    subject = msg.get('subject', 'Your mail')
    if 're:' not in subject.lower(): subject = 'Re: ' + subject
    hdrs.append('Subject: %s\n' % subject)

    msg_id = msg.get('message-id', None)
    if msg_id: hdrs.append('In-Reply-To: %s\n' % msg_id)

    references = msg.get_slow('references', '')
    if msg_id: references += '\n\t%s' % msg_id
    hdrs.append('References: %s\n' % references)
                              
    date = msg.get('date', 'an unknown date')
    name = realname(msg.get('from', 'an unknown person'))
    citation_line = 'On %s, %s wrote:\n' % (date, name)

    return ''.join(hdrs + ['\n', citation_line] +
                   ['> %s\n' % line for line in msg.utf8_body_lines()] +
                   ['\n'])

class MessageListFacade:
    def __init__(self, mboxfilename):
        self.fp = file(mboxfilename)
        # We don't really care what kind of objects self.mbox.next()
        # returns, as long as they aren't None.
        self.mbox = SeekableUnixMailbox(self.fp, lambda subfile: subfile)
        self.msgs = [self.mbox.tell()]
        self.metadata = []
        self.cachefilename = mboxfilename + '.cached-summary.pck'
        self.dirty_bit = False
    def last_known_message(self):
        return len(self.msgs) - 1
    def __getitem__(self, index):
        self.mbox.seek(self.msgs[-1])
        while index + 1 >= len(self.msgs):
            msg = self.mbox.next()
            if msg is None:
                # Someone was unclear on the iterator protocol
                # when they created the mailbox module; should
                # have used StopIteration!
                raise IndexError(index)
            self.dirty_bit = True
            # This puts the *end* offset of each message onto self.msgs
            self.msgs.append(self.mbox.tell())
        subfile = mailbox._Subfile(self.fp,
                                   self.msgs[index], self.msgs[index+1])
        rv = EmailMessage(subfile)
        while len(self.metadata) <= index: self.metadata.append(None)
        if not self.metadata[index]:
            self.dirty_bit = True
            self.metadata[index] = tuple(map(rv.get, rv.keys))
        rv.cached_metadata_is(self.metadata[index])
        return rv
    # I tried using marshal instead of cPickle to save the strings and
    # numbers that constitute the summary.  With marshal, saving cache
    # takes 1.3 seconds and 17MB; loading takes 1.8.  With cPickle,
    # saving cache takes 2.0 seconds and 16MB; loading takes 1.9.  The
    # difference is detectable but not worthwhile.
    def write_cached_metadata(self):
        if not self.dirty_bit: return
        try:
            newfilename = self.cachefilename + '.new'
            outfile = file(newfilename, 'w')
            cPickle.dump(self.metadata, outfile, 2)
            cPickle.dump(self.msgs, outfile, 2)
            outfile.close()
            os.rename(newfilename, self.cachefilename)
            self.dirty_bit = False
        except KeyboardInterrupt: pass
    def read_cached_metadata(self):
        try: infile = file(self.cachefilename)
        except IOError: return
        self.metadata = cPickle.load(infile)
        self.msgs = cPickle.load(infile)
        self.dirty_bit = False
        # XXX need to validate the metadata!

class tagstore:
    def __init__(self, filename=None):
        if filename is None:
            filename = os.path.join(os.environ['HOME'], '.cursmailmsgtags')
        self.file = file(filename, 'a+')
        self.tags = {}
        self.update_count = 0
        for line in self.file:
            fields = line.split()
            msgid = fields[0]
            self.set_tags(msgid, fields[1:])
    def __getitem__(self, msgid):
        try: return self.tags[msgid]
        except KeyError: return ('untagged',)
    def has_key(self, msgid):
        return self.tags.has_key(msgid)
    def set_tags(self, msgid, tags):
        self.tags[msgid] = tuple(tags)
        self.update_count += 1
    def __setitem__(self, msgid, tags):
        if tags == (): tags = ('untagged',)
        self.set_tags(msgid, tags)
        self.file.write(' '.join([msgid] + list(tags)) + '\n')
        self.file.flush()

class View:
    def __init__(self, search_terms, tags, mboxlist):
        self.query = Query(search_terms)
        self.mboxlist = mboxlist
        self.tags = tags  # tag store
        self.clear_caches()
    def clear_caches(self):
        self.forward_cache = {}
        self.backward_cache = {}
        self.last_tag_update = self.tags.update_count
    def caches_outdated(self):
        """Probably this is the wrong idea, but we cache search results.

        It's the wrong idea because it's probably possible to make
        search results reliably fast and get rid of all the caching
        logic.  However, in the mean time, this method tells other
        methods in this object whether the cache is invalid.
        """
        return (self.last_tag_update != self.tags.update_count and
                self.query.depends_on_tags())
    # This approach to searching will probably always be too slow.
    def search_forward(self, message_index):
        if self.caches_outdated(): self.clear_caches()
        if message_index not in self.forward_cache:
            try:
                newmi = message_index
                while 1:
                    newmi += 1
                    if newmi in self:
                        self.forward_cache[message_index] = newmi
                        if message_index in self:
                            self.backward_cache[newmi] = message_index
                        break
            except IndexError:
                self.forward_cache[message_index] = message_index
        return self.forward_cache[message_index]
    def search_backward(self, message_index):
        if self.caches_outdated(): self.clear_caches()
        if message_index not in self.backward_cache:
            newmi = message_index
            while newmi > 0:
                newmi -= 1
                if newmi in self:
                    break
            if message_index in self and newmi < message_index:
                self.forward_cache[newmi] = message_index
            self.backward_cache[message_index] = newmi
        return self.backward_cache[message_index]
    def __contains__(self, message_index):
        return self.query.message_matches(self.mboxlist[message_index],
                                          self.tags)

def any(iterable_of_booleans):
    for each in iterable_of_booleans:
        if each: return True
    return False
def all(iterable_of_booleans):
    return not any(not each for each in iterable_of_booleans)

class SearchTerm:
    """Null search term that always matches.  Ancestor of all search terms."""
    def __init__(self, term): self.term = term
    def position_in_string(self, astr):
        """For highlighting matches: return (position, length)."""
        return (len(astr), 0)
    def depends_on_tags(self): return False
    def matches(self, msg, tags): return True
class TagTerm(SearchTerm):
    def depends_on_tags(self): return True
    def matches(self, msg, tags):
        return self.term in tags[message_id(msg)]
class HeaderTerm(SearchTerm):
    def matches(self, msg, tags):
        return (self.term in msg.get(self.header, '') or
                self.term in msg.get_readable(self.header, ''))
class SubjectTerm(HeaderTerm): header = 'subject'
class FromTerm(HeaderTerm): header = 'from'
class ListNameTerm(SearchTerm):
    class SenderTerm(HeaderTerm): header = 'sender'
    class ListPostTerm(HeaderTerm): header = 'list-post'
    def __init__(self, term):
        self.kids = [self.SenderTerm(term), self.ListPostTerm(term)]
    def matches(self, msg, tags):
        return any(kid.matches(msg, tags) for kid in self.kids)
class ToTerm(ListNameTerm):
    class CcTerm(HeaderTerm): header = 'cc'
    class ToTerm(HeaderTerm): header = 'to'
    def __init__(self, term):
        self.kids = [self.CcTerm(term), self.ToTerm(term)]
class WholeMessageTerm(SearchTerm):
    def matches(self, msg, tags): return self.term in msg.as_string()
    def position_in_string(self, astr):
        rv = astr.find(self.term)
        if rv == -1: return SearchTerm.position_in_string(self, astr)
        return rv, len(self.term)
class NotTerm(SearchTerm):
    def __init__(self, term): self.kid = term_factory(term)
    def matches(self, msg, tags): return not self.kid.matches(msg, tags)
    def depends_on_tags(self): return self.kid.depends_on_tags()

term_prefix_table = [
    ('-', NotTerm),
    ('@', TagTerm),
    ('s:', SubjectTerm),
    ('f:', FromTerm),
    ('l:', ListNameTerm),
    ('t:', ToTerm),
    ('', WholeMessageTerm),
]

def term_factory(term):
    for prefix, termtype in term_prefix_table:
        if term.startswith(prefix):
            return termtype(term[len(prefix):])
    assert 0, term

class Query:
    def __init__(self, search_terms):
        self.search_terms = search_terms
        # Ensure it's never empty:
        self.term_objects = ([SearchTerm('')] +
                             map(term_factory, search_terms.split()))
    def depends_on_tags(self):
        return any(term.depends_on_tags() for term in self.term_objects)
    def message_matches(self, msg, tags):
        # This is too slow.
        return all(term.matches(msg, tags) for term in self.term_objects)

def add_highlighted_str(win, query, row, astr):
    win.move(row, 0)
    while astr:
        pos, hitsize = min([term.position_in_string(astr)
                            for term in query.term_objects])
        normal = astr[:pos]
        highlighted = astr[pos:pos+hitsize]
        astr = astr[pos+hitsize:]
        try:
            win.addstr(normal, curses.color_pair(bodytext))
            win.addstr(highlighted, curses.color_pair(white))
        except:
            win.addstr(row, 0, 'ERROR')

def add_wrapped_str(win, row, width, astr, query=Query('')):
    cur_row = row
    while 1:
        front, astr = astr[:width], astr[width:]
        add_highlighted_str(win, query, cur_row, front)
        cur_row += 1
        if not astr: break
        if cur_row >= curses.LINES-2: break
    return cur_row

def realname(addr):
    realname, email_address = email.Utils.parseaddr(addr)
    return realname or email_address

def msgdate(msg):
    try:
        date = email.Utils.parsedate(joinlines(msg['date']))
        return time.strftime('%Y-%m-%d %H:%M', date)
    except:
        return "(couldn't parse date)"

def message_id(msg):
    return joinlines(msg.get('message-id',
                             'spam without a message id')).replace(' ', '-')

yellow = 1
white = 2
bodytext = 3

def adjwidth(astr, width):
    if len(astr) < width: return astr + ' ' * (width - len(astr))
    else: return astr[:width]

def draw_hdr_line(stdscr, msg, tagstore=None):
    name = realname(msg.get_readable('from', '(no sender)')) + ' '
    date = ' ' + msgdate(msg)
    tags = ''
    remaining_space = curses.COLS - len(name) - len(date)
    if tagstore:
        tags = (' ' + ' '.join(tagstore[message_id(msg)]))[:remaining_space]
    subj = adjwidth(msg.get_readable('subject', '(no subject)'),
                    remaining_space - len(tags))
    try:
        stdscr.addstr(name, curses.color_pair(white) | curses.A_BOLD)
        stdscr.addstr(subj, curses.color_pair(yellow) | curses.A_BOLD)
        stdscr.addstr(tags, curses.color_pair(yellow))
        stdscr.addstr(date, curses.color_pair(white) | curses.A_BOLD)
    except:
        stdscr.addstr('error displaying header line',
                      curses.color_pair(yellow) | curses.A_BOLD)

def last_message(mboxlist):
    message_index = mboxlist.last_known_message()
    try:
        while 1:
            mboxlist[message_index]
            message_index += 1
    except (IndexError, KeyboardInterrupt):
        return message_index - 1

def set_search(stdscr, search_terms):
    editwindow = stdscr.derwin(1, curses.COLS, 0, 0)
    editwindow.clear()
    if search_terms: search_terms += ' '
    editwindow.addstr(search_terms)
    textbox = curses.textpad.Textbox(editwindow)
    return textbox.edit().strip()

def write_to_file(file_to_write, msg):
    fp = file(file_to_write, 'ab')
    fp.write(msg)
    fp.close()

def subject_terms(message):
    return ' '.join('s:' + term for term in
                    message.get_readable('subject', '').split()
                    if term.lower() != 're:')

class MailBrowser:
    def __init__(self, stdscr, mboxfilename):
        self.stdscr = stdscr
        self.mboxlist = MessageListFacade(mboxfilename)
        self.mboxlist.read_cached_metadata()
        self.go_to(0)
        self.tags = tagstore()
        self.view = View('', self.tags, self.mboxlist)
        self.detail_levels = [self.display_message_summary,
                              self.display_message_body,
                              self.display_message_source]
        self.detail_level = 1
    def go_to(self, new_message_index):
        # XXX a KeyboardInterrupt in this routine could cause bad behavior
        self.message_index = new_message_index
        self.current_message = self.mboxlist[self.message_index]
        self.lineoffset = 0
    def flash_msg(self, msg):
        self.stdscr.move(0, 0)
        flashing = curses.color_pair(yellow) | curses.A_BLINK | curses.A_BOLD
        self.stdscr.addstr(msg, flashing)
        self.stdscr.refresh()
    def more_detail(self):
        if self.detail_level < len(self.detail_levels) - 1:
            self.detail_level += 1
        self.lineoffset = 0
    def less_detail(self):
        if self.detail_level > 0: self.detail_level -= 1
        self.lineoffset = 0
    def display_message_body(self):
        self.redraw(self.current_message.utf8_body_lines())
    def display_message_source(self):
        self.redraw(self.current_message.as_string_lines())
    def redraw(self, lines):
        self.stdscr.bkgd(' ', curses.color_pair(bodytext))
        self.stdscr.clear()
        self.stdscr.attrset(curses.color_pair(yellow) | curses.A_BOLD)
        self.stdscr.addstr(0, 0, ' ' * curses.COLS)
        self.stdscr.addstr(1, 0, ' ' * curses.COLS)
        self.stdscr.addstr(0, 0, "[q]uit [n]ext [t]ag")
        if self.message_index != 0: self.stdscr.addstr(" [p]revious")
        self.stdscr.addstr('  ' + ' '.join(self.tags[
            message_id(self.current_message)]))

        self.stdscr.move(1, 0)
        draw_hdr_line(self.stdscr, self.current_message)
        self.stdscr.attrset(curses.color_pair(bodytext))

        row = 2
        lineoffset = self.lineoffset
        while row < curses.LINES:
            line = lines[lineoffset]
            row = add_wrapped_str(self.stdscr, row, curses.COLS, line,
                                  self.view.query)
            lineoffset += 1
    def display_message_summary(self):
        # This is painfully slow sometimes, depending on your
        # search.  Thus the .refresh().
        self.stdscr.bkgd(' ', curses.color_pair(white))
        self.stdscr.clear()
        message_index = self.message_index
        try:
            for ii in range(curses.LINES - 1):
                self.stdscr.move(ii, 0)
                draw_hdr_line(self.stdscr, self.mboxlist[message_index],
                              self.tags)
                self.stdscr.refresh()
                next_mi = self.view.search_forward(message_index)
                if next_mi == message_index: break  # no more matches!
                message_index = next_mi
        except KeyboardInterrupt:
            pass
    def toggle_spam_tag(self):
        msgid = message_id(self.current_message)
        curtags = self.tags[msgid]
        if 'spam' in curtags:
            self.tags[msgid] = tuple([tag for tag in curtags if tag != 'spam'])
        else:
            self.tags[msgid] = tuple([tag for tag in curtags
                                      if tag != 'untagged'] + ['spam'])

    def tag_message(self):
        msgid = message_id(self.current_message)
        editwindow = self.stdscr.derwin(1, curses.COLS, 0, 0)
        editwindow.clear()
        if self.tags.has_key(msgid):
            editwindow.addstr(' '.join(self.tags[msgid]))
        textbox = curses.textpad.Textbox(editwindow)
        self.tags[msgid] = textbox.edit().split()
    def change_search(self):
        self.view = View(
            set_search(self.stdscr, self.view.query.search_terms),
            self.tags, self.mboxlist)
    def debug_dump(self):
        self.stdscr.clear()
        items = self.current_message.cached_metadata.items()
        row = add_wrapped_str(self.stdscr, 0, curses.COLS,
                              repr([(k, id(v), v) for k, v in items]))
        add_wrapped_str(self.stdscr, row, curses.COLS,
                        repr(self.mboxlist.__dict__))

    def count_messages(self):
        self.flash_msg("Sorry, counting...")
        try:
            tmp_mi = -1
            count = 0
            while 1:
                new_mi = self.view.search_forward(tmp_mi)
                if new_mi == tmp_mi: break
                count += 1
                tmp_mi = new_mi
            self.flash_msg('%d messages match ' % count)
        except KeyboardInterrupt: return False  # didn't count
        return True                             # did count
    def summary_move_page(self, search_direction):
        self.flash_msg("Sorry, searching...")
        try:
            for ii in range(curses.LINES-2):
                new_mi = search_direction(self.message_index)
                if new_mi == self.message_index: break
                self.go_to(new_mi)
        except KeyboardInterrupt: pass
        self.detail_level = 0
    def go_to_end(self):
        self.flash_msg("Reading to end of mailbox...")
        self.go_to(last_message(self.mboxlist))
        # Note that this still updates the cache if the user hit ^C:
        self.flash_msg("Updating cached mailbox summary...")
        start = time.time()
        self.mboxlist.write_cached_metadata()
        #self.flash_msg("Updating cache took %0.3f seconds" %
        #               (time.time() - start))
        #continue

    def go_forward(self):
        self.go(self.view.search_forward)
    def go_backward(self):
        self.go(self.view.search_backward)
    def go(self, what_direction):
        self.flash_msg("Sorry, searching...")
        try:
            self.go_to(what_direction(self.message_index))
        except KeyboardInterrupt:
            pass
    def scroll(self, howmuch):
        self.lineoffset += howmuch
        if self.lineoffset < 0: self.lineoffset = 0
    def same_subject_view(self):
        return View(subject_terms(self.current_message),
                    self.tags, self.mboxlist)
    def write_reply(self):
        self.flash_msg('writing reply...')
        me = 'Kragen Javier Sitaker <[EMAIL PROTECTED]>'
        reply = ('~m~r\n' +
                 reply_skeleton(self.current_message, me).replace('~', '~t') +
                 '~e\n')
        write_to_file('tmp.replies', reply)
        msgid = message_id(self.current_message)
        curtags = self.tags[msgid]
        if 'replied' not in curtags:
            self.tags[msgid] = tuple([tag for tag in curtags
                                      if tag != 'toreply']) + ('replied',)

    def mainloop(self):
        file_to_write = 'tmp.mail'

        cargo_cult_routine(self.stdscr)
        curses.init_pair(yellow, curses.COLOR_YELLOW, curses.COLOR_BLUE)
        curses.init_pair(white, curses.COLOR_WHITE, curses.COLOR_BLUE)
        curses.init_pair(bodytext, curses.COLOR_BLACK, curses.COLOR_WHITE)

        key_table = {
            curses.KEY_DOWN: self.go_forward,  'n': self.go_forward,
            curses.KEY_UP:   self.go_backward, 'p': self.go_backward,
            ' ':  lambda: self.scroll(+4),
            curses.KEY_BACKSPACE:  lambda: self.scroll(-4),
            8:  lambda: self.scroll(-4),
            '[': lambda: self.go(self.same_subject_view().search_backward),
            ']': lambda: self.go(self.same_subject_view().search_forward),
            't': self.tag_message,
            curses.KEY_LEFT: self.less_detail,
            curses.KEY_RIGHT: self.more_detail, '\n': self.more_detail,
            curses.KEY_PPAGE:
                lambda: self.summary_move_page(self.view.search_backward),
            curses.KEY_NPAGE:
                lambda: self.summary_move_page(self.view.search_forward),
            's': self.toggle_spam_tag,
            '>': self.go_to_end,
            'r': self.write_reply,
        }
        for key, val in key_table.items():
            if not isinstance(key, type(0)): key_table[ord(key)] = val

        self.detail_levels[self.detail_level]()
        while 1:
            try:
                ch = self.stdscr.getch()
                if ch in key_table: key_table[ch]()
                elif ch == -1: pass
                elif ch == ord('d'):  # debug
                    self.debug_dump()
                    continue
                elif ch == ord('/'):  # search (not incremental, sadly)
                    self.change_search()
                    if self.message_index not in self.view: self.go_forward()
                elif ch == ord('?'):  # search backwards
                    self.change_search()
                    self.go_backward()
                elif ch == ord('q'): return  # quit
                elif ch == ord('#'):
                    # 'continue' to avoid erasing message count
                    if self.count_messages(): continue
                elif ch == ord('f'):  # write to file, or forward
                    self.flash_msg("Writing...")
                    write_to_file(file_to_write,
                                  self.current_message.as_string())
                    self.flash_msg("Written   ")
                    continue  # to not erase the flashing message
            except KeyboardInterrupt:
                pass
            self.detail_levels[self.detail_level]()

def realmain(stdscr, argv):
    MailBrowser(stdscr, argv[1]).mainloop()

def main(argv):
    cgitb.enable(format="text")
    curses.wrapper(lambda stdscr: realmain(stdscr, argv))
if __name__ == '__main__': main(sys.argv)


-- 
Kragen Javier Sitaker in Caracas, trying to get a clue

current iteration of Python curses mail client

Reply via email to