[Email-SIG] Proposed Enhanced MIME Handling API

R. David Murray Thu, 08 Aug 2013 09:36:45 -0700

If you read my blog (http://www.bitdance.com/blog, or on planet python), you
know that I'm self-funding working on the email package for at least the month
of August.  The first fruits of that work are now ready for discussion: a
design document for improving the MIME handling API.


The ReST content below can be viewed as html at:

    
http://www.bitdance.com/blog/2013/08/08_01_Proposed_Enhanced_MIME_Handling_API/

As it says at the end, feedback welcome.



Proposed Enhanced MIME Handling API
===================================

The thoughts herein are my own, without any feedback from anyone else (yet),
and consist of a somewhat-organized brain dump of my current thoughts.  This is
a stating point for discussion.  I'll also be posting this proposal on the
`Email SIG`__ mailing list, which is where the discussion will take place.  If
you wish to participate in the discussion and aren't already signed up, please
join us on the list.

__ http://mail.python.org/mailman/listinfo/email-sig

This post is long, but worth reading if you deal with email at all.


The Current API
---------------

Currently in the email package the API for dealing with MIME messages consists
of two pieces: an API for constructing MIME parts, and an API for querying a
message and its parts to find out what kind of MIME they are, and obtaining the
payload of an individual MIME part as either a text string or a ``bytes``
object, depending on the MIME type (text or not-text, respectively).  These two
APIs are pretty much completely distinct.  To create a MIME part, you import
the appropriate MIME class and instantiate an instance of it, passing the
constructor type-appropriate parameters.

For reference, the supported MIME types for constructing parts are:

   ===============================================  
=========================================
   Class                                            Arguments
   ===============================================  
=========================================
   :class:`email.mime.multipart.MIMEMultipart`      ``(subtype, boundary, 
subparts, **parms)``
   :class:`email.mime.application.MIMEApplication`  ``(data, subtype, encoder, 
**params)``
   :class:`email.mime.audio.MIMEAudio`              ``(data, subtype, encoder, 
**params)``            
   :class:`email.mime.image.MIMEImage`              ``(data, subtype, encoder, 
**params)``            
   :class:`email.mime.message.MIMEMessage`          ``(msg, subtype)``          
                  
   :class:`email.mime.text.MIMEText`                ``(text, subtype, 
charset)``                        
   ===============================================  
=========================================

Except for the fact that non-\ ``multipart`` objects override ``attach`` to
raise an error, these classes consist entirely of ``__init__`` code.  That is,
their entire purpose is to take the arguments passed to them and update the
base :class:`~email.message.Message` model with that information.

Except for ``MIMEMultipart``, their signatures are very similar.  ``MIMEText``
has a charset argument rather than having any way to take arbitrary parameters
for the :mailheader:`Content-Type` header the way the other non-\ ``multipart``
classes do, since for ``text`` parts only ``charset`` has a defined meaning.

The ability to set parameters on the :mailheader:`Content-Type` header is
necessary, but ``MIME`` has moved on since these classes were written.  Now one
also needs the ability to set the ``filename`` parameter of the
:mailheader:`Content-Disposition` header, not to mention setting the value of
that header itself (``inline`` versus ``attachment``).  Thus, while the purpose
of these classes is to make it easy to create MIME objects for the various main
types via a single constructor call, in practice one must also call
:meth:`~email.message.Message.add_header` to add the
:mailheader:`Content-Disposition` header.

Also note that while the API provides a way to control the
:mailheader:`Content-Transfer-Encoding` of non-text parts, it does not do the
same for text parts.  There if one wants to control the encoding one must play
around with :mod:`~email.charset` definitions.

You might also notice that in the above table I've used "normal" formal
argument names, instead of the actual ``_`` prefixed names used by the classes.
The prefix is used so that the :mailheader:`Content-Type` parameter names may
be spelled normally, without a risk of clashing with the constructor argument
names.

With the current API, constructing a MIME message looks something like this
(note that I haven't actually tested this code)::

   >>> from email.mime.multipart import MIMEMultipart
   >>> from email.mime.text import MIMEText
   >>> from email.mime.image import MIMEImage
   >>> from email.mime.application import MIMEApplication
   >>> rel = MIMEMultipart('related')
   >>> with open('myimage.jpg') as f:
   >>>     data = f.read()
   >>> img = MIMEImage(data, 'jpg')
   >>> img.add_header('Content-Disposition', 'inline')
   >>> img.add_header('Content-ID', '<image1>')
   >>> body = MIMEText('html',
   ...                 '<p>My test <img href="cid:image1"><\p>\n')
   >>> rel.attach(body)
   >>> rel.attach(img)
   >>> alt = MIMEMultipart('alternative')
   >>> alt.attach(MIMEText('My test [image1]\n'))
   >>> alt.attach(rel)
   >>> msg = MIMEMultipart('mixed')
   >>> msg.attach(alt)
   >>> with open('spreadsheet.xls') as f:
   >>>     data = f.read()
   >>> xls = MIMEApplication(data, 'vnd.ms-excel')
   >>> xls.add_header('Content-Disposition', 'attachment', 
   ...                filename='spreadsheet.xls')
   >>> msg.attach(xls)
   >>> msg['To'] = 'email-sig@python.org'
   >>> msg['From] = 'rdmur...@bitdance.com'
   >>> msg['Subject] = 'A sample basic modern email'

Not too bad, but there's a lot one has to know about how MIME messages are
constructed to get it right.

Now, on the flip side, if we *receive* the above message, the parser turns it
into a tree of :class:`~email.message.Message` objects (no MIME sublasses).  We
have the :meth:`~email.message.Message.is_multipart` method to determine if a
part is a multipart or not, and we can query the headers and header parameters
to find out other information about the part.  To get the content, we use
:meth:`~email.message.Message.get_payload`.  For a text part, that gets us the
body as a string.  For a non-text part, we have to pass it ``decode=True``,
which gets us the body as a ``bytes`` object.  (This awkward ``get_payload``
API is a legacy of the translation from Python2 to Python3.)


Email-SIG Initial Design Thoughts Run Aground
---------------------------------------------

In redesigning the email package API, one of the Email-SIG's goals was to as
much as possible remove the need for the library user to be an email expert in
order to compose and process messages.  To this end we have already redesigned
the header parsing so that all the heavy lifting is done behind the scenes: now
you deal with text strings, and get and set the information from the header
objects (such as addresses or MIME parameters) via attributes.

For the MIME bodies, we visualized that the parser would produce type-specific
objects that would have type-specific attributes that you could set and query
in order to manipulate a part.  This was left very grey from a detailed design
standpoint, but conceptually we expected there to be a registry of MIME types
which the parser would use to create specific MIME objects, much like the
header registry we added for dealing with header parsing.  We expected that we
would then extend the existing MIME type objects with new type-specific APIs to
make them more useful.  Things such as being able to query the image-related
attributes (size, resolution, etc) of a MIMEImage object.

We didn't really think too much about the message creation side of things, the
existing constructors seemed adequate, perhaps with a few enhancements.

So last week I started implementing the MIME registry, beginning with copying
the :mod:`~email.headerregistry` code and starting in to adapt it to the new
goal.

I quickly ran in to issues.  The current parser operates by creating an empty
:class:`~email.message.Message` object and stuffing parsed headers in to it.
It then parses the body and adds that information.  So to create the parts as a
custom MIME type, the parser would either need to pre-parse the headers and
look up the content-type, or it would have to create the new class after having
already built a ``Message`` object, and copy all the information from the
existing object to the new specialized class.

Pre-parsing the headers isn't crazy.  At one point during the header parsing
phase of this project I had a separate object to hold headers, specifically
with this eventuality in mind.  I ended up discarding it, though, because of
the way that we ended up implementing the :mod:`~email.policy` API.  It would
not be impossible to resurrect that separate header object, but it would be a
pain, and would result in a non-trivial amount of extra complexity in the
``Message`` code.

The second alternative had a code smell to me.

So, as I mentioned in the previous blog post, I took a long walk and thought
about why it bothered me, and realized a few things.


Thoughts About An Alterantive
-----------------------------

The breakdown above of the existing API into the parsing/access API and the
creation API came out of that cogitation.  Previously, I hadn't really
understood that the mime classes are *just* about creation of parts.  They have
no function during parsing/access.  This design makes sense: by the design of
the MIME RFCs, a MIME part follows a standard syntax and a certain set of
shared semantics.  Aside from the type labels, the differences between types
are encapsulated in allowed MIME parameters and their values, and then of
course the content of the part itself.  But that content can always be viewed
as a data stream, and the parameters all have a common syntax and therefore
share a common access methodology.

In other words, at the parsing level and the access mechanics level there is no
need to differentiate the MIME parts based on type, other than the distinction
between the multipart type and all the other types.  All non-\ ``multipart``
types can be (and are, currently) treated the same.  (Well, except for
``message``, which is a psuedo-\ ``multipart`` in the current implementation,
but that's an artifact of the implementation of the model, not a fundamental
part of the model.)

The point at which the types matter is the point at which we want to access (or
store, for a part we are creating) the *content*.

Given that our desire is to encode the details of message construction inside
the library, so that the library user doesn't need to know about them, what we
really want is a way to retrieve the *content* from the message, and a way to
store *content* into a message.  Ideally, as a library user we shouldn't even
have to worry about assigning the content type!

Is this possible?  I believe it is, at least most of the time.

Consider a simple Image part.  We'd like to be able to get the image out of the
part without having to think about any of the MIME details.  Currently we do
need to think about them, at least to the extent of checking the subtype to
find out what kind of image data we have, and then calling the right code to
turn the binary data we extract from the part into an object of an appropriate
and useful type.

What if we could do this:

    >>> imgpart.get_content()
    <PIL.JpegImagePlugin.JpegImageFile image mode=RGB 
        size=2592x1944 at 0xECD3E4C>

Or alternatively:

    >>> imgpart.get_content()
    '/tmp/tmp4e2nmy.jpg'

And on the input side, suppose we could do:

    >>> img = Message().set_content(mypilimageobj)

or

    >>> img = Message().set_content('myimage.jpg')

In these examples we are treating the ``Message`` object as the generic MIME
container it is, and getting and setting the type-specific content.

Now, before anyone panics, I'm not proposing to make the email package depend
on pillow__ in any way.  I have a much more generic idea in mind here.

__ https://pypi.python.org/pypi/Pillow


A Framework for Handling MIME
-----------------------------


There are two conceptual elements to this proposed framework.  The first
element deals with creating MIME parts from content objects and extracting
content objects from MIME parts, and the second element deals with creating a
multipart message by combining MIME parts.


Content Management
~~~~~~~~~~~~~~~~~~

A sketch of the end-user interface is shown in the preceding section.  To
implement it, we introduce the concept of a "content manager".  A content
manager is somewhat analogous to our header registry, in that it is a registry
and it can be accessed though the current policy.  Its operation is
significantly different, however.

The full signatures of these proposed new ``Message`` methods are::

    get_content(*args, content_manager=None, **kw)
    set_content(*args, content_manager=None, **kw)

If *content_manager* is not specified, the default content manager specified
by the ``Message``\ 's current policy is used.

A content manager has two methods that correspond to the ``get_content`` and
``set_content`` ``Message`` methods proposed above.  These methods take a
message object as their first non-``self`` argument.  (That is, they are really
double-dispatch methods.)  When ``get_content`` and ``set_content`` are called
on ``Message``, the ``content_manager``\ 's corresponding methods are called,
passing the ``Message`` object as the first argument.

The content manager is responsible for populating a bare ``Message`` object
with the data needed to encode whatever content is passed to its
``set_content`` method, and for turning the data stored in a parsed part into a
useful object when its ``get_content`` method is called.  *How* it does this is
completely up to the content manager.  The get and set methods are the only
required part of the API.  In fact, only the *names* of the methods and their
first argument (the ``Message``) are part of the API:  get and set methods may
take an arbitrary number of additional positional and keyword arguments.

The email package will provide a registry based content manager base class.
It will manage two mappings:

The "get" mapping maps from MIME types to a function.   This function takes the
``Message`` object as its argument and returns an arbitrary value.  Any
additional arguments or keywords to the ``get_content`` method are passed
through to it, but in most cases there will be none.

The "set" mapping maps from a Python type to a function.  The type is looked
for in several ways: first by identity (using the type itself as the key), then
using the type's ``__qualname__``, and finally using the type's ``__name__``.
This base content manager class's ``set_content`` function has an additional
required positional argument beyond that specified by the content manager API
itself: the object whose type will be looked up in the registry.  The function
returned by the registry takes two positional arguments, the ``Message`` object
and the object passed to the ``set_content`` method.  Any additional arguments,
positional or keyword, are also passed through to the function returned by the
registry.

There will doubtless be numerous instances or subclasses of the content manager
with different registry entries, depending on the needs of particular
applications.  If this proposal is accepted, I envision shipping the email
package with three built-in content manager subclasses:  a ``RawDataManager``,
a ``FileManager``, and an ``ObjectManager``.

RawDataManager
    This manager will provide no more facilities than the current MIME classes
    do.  The signature of its ``set_content`` method is::

        set_content(msg, string, subtype="plain", cte=None,
                    disposition=None, filename=None, cid=None,
                    params=None)
        set_content(msg, bytes, maintype, subtype, cte=None,
                    disposition=None, filename=None, cid=None,
                    params=None)

    This is a direct replacement for the existing non-\ ``multipart``
    constructors shown above.  It adds the ability to set value of
    the :mailheader:`Content-Disposition` header, the ``filename``
    (which is a parameter on the :mailheader:`Content-Disposition` header), a
    way to set the :mailheader:`Content-ID` header value, uses the name for the
    content transfer encoding rather than an :mod:`email.encoders` object, and
    groups the extra parameters (which for the ``text`` type includes
    ``charset``) into a single dictionary rather than allowing them to be
    keywords.

    The reason for this last change is both to avoid needing to use a ``_``
    prefix for the other, more commonly used arguments, and to make it clear
    that these values are different from the Python keyword parameters:  they
    are not checked for validity, they are simply passed through onto the
    :mailheader:`Content-Type` header.  In other words, you should use this
    facility only when you do know what you are doing.

    (Note: I'm not certain switching away from ``encoders`` is a good idea,
    it's a thought experiment that will be further informed by the
    implementation.)

    The ``get_content`` method returns a string if the maintype of the part is
    ``text``, and a ``bytes`` object otherwise.  To find out the nature of the
    data, you must interrogate the content type (and possibly its parameters),
    just as you do with the existing email API.

    ``RawDataManager`` is designed to give you the maximum amount of control
    while still making the API simpler to use.  You should use this manager
    only if you need that level of control, and know what you are doing.

FileManager
    The ``set_content`` method of this manager takes a file system path, and
    its ``get_content`` method returns a filename.  The constructor of this
    content manager will optionally take a path representing a directory, which
    will be used as the starting point for interpreting the paths passed to
    ``set_content``, and the directory in which the files returned by
    ``get_content`` will be located.  If a directory is not specified, paths
    will be relative to the current working directory.  ``set_content`` will
    use the ``mimetypes`` module to guess the appropriate mime type.
    ``get_content`` will use ``mimetypes`` to determine the appropriate
    extension for the file if the part has no ``name`` or ``filename`` MIME
    parameter.  ``set_content`` will also accept the non-mime-type keywords
    supported by the ``RawDataManager``.  If ``filename`` is not specified the
    filename (without any leading directory path) of the path passed as the
    first argument is used.

    Ideally the manager will set additional :mailheader:`Content-Type`
    parameters when it can figure out the correct values from the input data.
    Explicit values passed in the *params* dict would override these computed
    values.

    This content manager is suitable for something like a Mail User Agent,
    where extracting attachments to disk and reading attachments from disk are
    the most common operations.

    (One can also imagine a ``MailcapManager``, which would actually call the
    appropriate mailcap-specified program when ``get_content`` is called, but
    that is something for an MUA author to write, not something to ship with
    the standard library.)

ObjectManager
    This manager is closest in spirit to the original Email SIG proposal, and
    is possibly the one that the default policy will use.  The registry maps
    between MIME types and specialized objects.

    The objects returned by this manager's ``get_content`` will depend on
    whether or not the stdlib provides any suitable object.  For ``message``
    type objects, for example, we can return a :class:`~email.message.Message`
    object.  For ``audio`` we could return an appropriate reader object for
    :mod:`aifc` and :mod:`wave` files.  For ``text`` types we would obviously
    return a string.  For the rest, the best we can do is to return a bytes
    object.  However, an application is free to register additional type object
    methods, and the content manager functions the application registers will
    probably be able to take advantage of utility functions provided by the
    content manager module to make the resulting functions fairly
    straightforward to write.  (This is how one could get a ``pillow`` object
    when calling ``get_content``.)

    For ``set_content``, the ``str`` type uses the same signature used by the
    ``RawDataManager`` for the ``text`` type, except that it does not support
    passing in arbitrary extra parameters.  (This is for the same reason
    ``MIMEText`` doesn't support it:  there are no *defined* additional
    parameters for ``text`` parts other than ``charset``.)

    For other types I will try to directly support the RFC defined parameters
    both here and in the ``FileManager``.  But there are so many that it won't
    be practical to handle them all, so there will still be a *params* keyword
    argument to pass arbitrary additional parameters.  Among the valid input
    types will be anything handled by the standard library that I have time to
    implement (eg: :mod:`aifc`, :mod:`wave`, :class:`email.message.Message`).
    For images, there will be a utility class you can pass a ``bytes`` object
    or filename to which will use :mod:`imghdr` to determine the image type.
    The resulting instance can then be passed to ``set_content``.

    A ``bytes`` object or a file opened in binary mode will be treated as type
    ``application``, and will require that the MIME subtype be passed
    explicitly.

Obviously each of these content managers are useful in different circumstances,
quite possibly even within the same application, which is why the
``set_content`` and ``get_content`` methods of ``Message`` accept a
``content_manager`` keyword argument.

Note in particular that the current email package doesn't explicitly support
the ``video`` maintype, and the standard library has no video-oriented
utilities.  So for this type you will have to use the ``RawDataManager`` or the
``FileManager`` and do your own parameter setting (although we might consider
creating a ``Video`` utility class just to allow the mimetype to get set
automatically.)


Building Multipart Messages
~~~~~~~~~~~~~~~~~~~~~~~~~~~

A MIME multipart message can have an arbitrarily complex structure.  But
conceptually we can break down (most) messages into a relatively simple
structure:  the message will have a "body" and one or more "attachments".  The
"body" is generally one of three things:  either a simple ``text/plain`` part, a
simple ``text/html`` part, or a ``multipart/related`` part consisting of a
``text/html`` part and zero or more parts that are referenced from the ``html``
part.  Complicating this simple picture, a message may have more than one
version of the "body" of varying degrees of "richness" (plain text versus html
being by far the most common).

Most email processing programs want to find the "body" first.  Some will want
only the simplest available text part, while others will prefer the complete
data for the richest version.  You might also have a processor that wanted html
if it was available, but would ignore everything else in a ``related`` part if
there was one.

Using the existing email API, a program generally will use the ``walk`` method
to walk down the tree of parts, looking for the part of the type it is most
interested in.  This is such a common task that it would be nice to have a
direct API for it.  I propose the following method:

    get_body(preferencelist=('related', 'html', 'text'))

*preferencelist* is a tuple of strings that indicates the order of preference
for the part returned.  If ``html`` is included in the list and ``related`` is
not, then the ``html`` part of a ``related`` part would be returned if there is
no separate ``html`` part.  If only ``text`` is specified and there is no
``text`` part, ``None`` is returned.  Likewise if only ``html`` is specified
and there is no ``html`` part.  Specifying ``related`` by itself is an error;
the preferences string must always contain at least one of ``text`` or
``html``.  (There is an edge case:  if there is no ``multipart/related`` but
there are both ``html`` and ``text`` parts in a ``multipart/mixed``, what
should the behavior be?  Probably the first one should be treated as the only
body candidate and the other treated as an attachment, but real world data
might recommend otherwise.)

Complementing ``get_body``, I propose an ``iter_attachments`` method, which
would return an iterator over of all of the parts that are *not*
``multipart/alternative``, ``multipart/related``, or the first ``text`` (or
``html``) part in a ``multipart/mixed``.  A non-\ ``multipart`` part would
return an empty iterator.  (Note that it is intentional that calling this on a
``multipart/related`` will return the ``related`` parts as attachments.  I
think this is the most useful semantic, but it is certainly open for
discussion.)

A bit more tentatively, I'd also like to propose an ``iter_parts`` method that
would return an iterator over all of the parts of any ``multipart``, and return
``None`` on a non-\ ``multipart``.  This is equivalent to what ``get_payload``
currently returns for a ``multipart``, but I have a (long?) term goal of
deprecating ``get_payload``.

The ``walk`` method can be still be used to walk more complicated message
structures, if needed, but I suspect most programs will use ``get_body`` and
``iter_attachments``, and then do some sort of recursion if an attachment turns
out to be a ``multipart``.

What about ``get_content`` on a ``multipart``?  The obvious thing would be to
raise an error, but...calling ``get_content`` on a ``mulitpart/related`` using
the ``FileManager`` could actually be given a meaning:  parsing the html using
standard library tools, sanitizing it, and replacing the cid references with
references to the related parts where they were placed no disk, such that if the
filename returned were passed to a web browser, it could actually display the
content.

I doubt that I am going to provide such a routine at this point, but I want to
allow for the possibility of such a routine being written.  Therefore it is the
responsibility of the content manager to throw an error if it cannot satisfy a
``get_content`` call on a ``multipart``, and the provided content managers will
do so.

So that handles the "get" side of things.

For *creating* messages, we need to build up an example of our conceptual model
message:  provide a body and one or more attachments.

There is a corresponding ``set_content`` possibility for ``multipart/related``.
One could pass in a web page and have the program parse it to find the linked
resources and include them as parts in the ``related``, computing ``cid``\ s as
it goes.  In that specific case the ``set_content`` method would be able to
figure out that the part should be created as a ``multipart/related``.

Being able to figure out the ``multipart`` subtype from the input data can only
be done in that specific case, though.  Otherwise we have a list of parts, and
how they relate to each other cannot be known a-priori.  So we need to tell
``set_content`` what the relationship is, by explicitly specifying the subtype.

Thus for creating ``multipart``\ s, all of the above content managers support
the following syntax:

    set_content(partslist, subtype, boundary=None, params=None)

This should look kind of familiar, since it mimics the existing
``MIMEMultipart`` constructor, albeit with a slightly different parameter
order.  The *partslist* is a ``list`` of ``Message`` objects with their
content already set.

To build a multipart message in this way, you do have to understand a bit about
MIME message structure.  You have to know that the outermost part should be a
``multipart/mixed``, and that its first part should be a
``multipart/alternative`` and its other parts the message attachments.

Can we do better?  Again, I think so.

It seems to me that a more natural way to form a message would be something
like this::

   >>> from email.message import MIMEMessage
   >>> from email.contentmanager import FileManager
   >>> msg = MIMEMessage()
   >>> msg['To'] = 'email-sig@python.org'
   >>> msg['From] = 'rdmur...@bitdance.com'
   >>> msg['Subject] = 'A sample basic modern email'
   >>> msg.set_content("My test [image1]\n')
   >>> rel = MIMEMessage()
   >>> rel.set_content('<p>My test <img href="cid:image1"><\p>\n',
   ...                 'html')
   >>> rel.add_related('myimage.jpg',
   ...                 cid='image1', content_manager=FileManager)
   >>> msg.make_alternative()
   >>> msg.add_alternative(rel)
   >>> msg.add_attachment('spreadsheet.xml',
   ...                    content_manager=FileManager)

The idea here is that calling ``add_related`` converts a non\- ``multipart``
message into a ``multipart/related`` message, moving the original content to a
new part and making it the first part in the new ``multipart``.  Similarly,
``make_alternative`` converts to a ``multipart/alternative``, and
``add_attachment`` converts to a ``multiprt/mixed``.  Any of these methods is
valid on any non-\ ``multipart`` part, but on ``multipart`` types only some
are valid.  The full matrix is:

    =====================   ============================================
    Type                    Valid Methods
    =====================   ============================================
    non-multipart           add_related, add_alternative, add_attachment
                            make_related, make_alternative, make_mixed
    related                 add_related, add_alternative, add_attachment
                            make_alternative, make_mixed
    alternative             add_alternative, add_attachment, make_mixed
    mixed                   add_attachment
    =====================   ============================================

That is, you can promote from ``related`` to ``alternative`` or ``mixed``, and
from ``alternative`` to ``mixed``, but you can only promote, not demote.  This
scheme seems to me to provide a natural way of building up messages from their
component parts, without having to think too much about the actual MIME
structure.  If you get it wrong, you get an error.

I think this is reasonably elegant, but it is just a slight bit magical, so I
won't be surprised if I get some pushback on it.  I think you will at least
agree that it is much shorter that the same example shown earlier using the
existing API.

We can can make it even shorter by using a helper class for ``related``.  We
can provide a ``Webpage`` helper class whose constructor takes a string or
file-like object providing the html, and a dictionary mapping content ids to
objects.  The content manager can construct a complete ``multipart/related``
from this object::

   >>> from email.message import MIMEMessage
   >>> from email.contentmanager import FileManager
   >>> msg = MIMEMessage()
   >>> msg.set_content("My test [image1]\n')
   >>> msg['To'] = 'email-sig@python.org'
   >>> msg['From] = 'rdmur...@bitdance.com'
   >>> msg['Subject] = 'A sample basic modern email'
   >>> msg.set_content("My test [image1]\n')
   >>> rel = Webpage('<p>My test <img href="cid:image1"><\p>\n',
   ...               dict=('image1'=Image('myimage.jpg')))
   >>> msg.add_alternative(rel)
   >>> msg.add_attachment('spreadsheet.xml',
   ...                    content_manager=FileManager)

In an ideal world we'd take it one step further, and have a parsing content
manager that could automatically compute the text version of a ``related`` part
as well::

   >>> from email.message import MIMEMessage
   >>> from email.contentmanager import FileManager
   >>> msg = MIMEMessage()
   >>> msg['To'] = 'email-sig@python.org'
   >>> msg['From] = 'rdmur...@bitdance.com'
   >>> msg['Subject] = 'A sample basic modern email'
   >>> body = Webpage('<p>My test <img href="cid:image1"><\p>\n',
   ...                dict=('image1'=Image('myimage.jp')))
   >>> msg.set_content(body)
   >>> msg.add_attachment('spreadsheet.xml',
   ...                    content_manager=FileManager)

That will need to be provided (at least initially) by a third party extension,
though, since parsing and munging html into text is a non-trivial project all
by itself.


Feedback Time
-------------

So there you have it.  The distillation of four days of intense design
thinking (I get much more exercise in the design phases of a project than
any other time).  Go ahead and tear it apart on the email-sig mailing
list.  Hopefully it won't wind up in *too* many small pieces :)
_______________________________________________
Email-SIG mailing list
Email-SIG@python.org
Your options: 
http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com

[Email-SIG] Proposed Enhanced MIME Handling API

Reply via email to