Mail from ILUG-BOM list (Non-Digest Mode)

This is gonna be like a small info bulletin on how attachments are sent.

When you send mail using the SMTP protocol, you basically have only the
following commands:

HELO              - Greet the mail server.  Used once per session - at the
                    beginning of the session
MAIL FROM: <from> - Announce who the sender is.  Used once per mail,
                    before specifying any recipients for each mail, or
                    after a RSET
RCPT TO: <rcpt>   - Announce who the mail is to.  Multiple recipients are
                    allowed, each must have its own RCPT TO:
                    entered immediately after a MAIL FROM:
DATA              - Starts mail entry mode.  Everything entered on the
                    line following DATA is treated as the body of the
                    message and is sent to the recipients.  The DATA
                    terminates with a . (period) on a line by itself.
                    A mail may be queued or sent immediately when the . is
                    entered.  It cannot however be reset at this stage.
RSET              - Reset the state of the current transaction.  
                    The MAIL FROM: and RCPT TO: for the current
                    transaction are cleared.
QUIT              - End the session.  No commits happen here.

I'll deal with return codes later (maybe a different mail).

All other commands simply extend the usage of these.  If you look at the
commands above, you will notice a few things.

1. Almost all commands need an argument.  The exceptions are RSET and
   QUIT.
2. DATA takes a multiline argument on the line after DATA has been
   acknowledged.
3. If multiline arguments (like DATA) terminate with a . on a line by
   itself, then how do you enter a . on a line by itself?
4. What about the other header fields?  Subject:, Reply-To:, custom
   headers etc?
5. There is no provision for attaching files.

Let's now look at these points.

1. HELO, MAIL FROM: and RCPT TO: tell the SMTP server all it needs to know
   to get the mail through, and bounce it in case of error.  Actually,
   HELO isn't required, but ensures that the client and server are
   speaking the same language.  (Ref: EHLO)

2. DATA tells the SMTP server what the contents of the mail are.  If you
   read through /var/spool/mail/$USER or the eqivalent on your system, you
   will find it full of what was typed as DATA, with maybe a few lines
   added by the SMTP server and mail client.

3. Escaping a . is the same as escaping a \ in C.  Preceed it with another
   .  In fact, the SMTP protocol states that any line that starts with a .
   should be preceeded by another .
   So if I wanted to say this:
      . is a period
   I would have to enter this:
      .. is a period
   The MUA is supposed to translate .. at the start of a line into .

4. If you send a mail directly using the SMTP protocol (telnet to the smtp
   server), and simply type the body of the message, then all you receive
   is the body along with the Date: and From: fields.  Try it.  All other
   fields have to be entered as part of the body.  In fact, you could
   override the default Date and From fields in the body of the mail.
   These two fields serve a dual purpose, acting as what is known as the
   UNIX_FROM_LINE in you mail file.  At the start of every mail, you will
   see a line like this:
       From philip Mon May 22 17:35:13 2000

   To enter other fields, just enter them as the first part of your DATA,
   separated from the actual body by a completely blank line.  Not even a
   space is allowed on this line.  eg:

     DATA
     354 Enter Data end with .
     From: [EMAIL PROTECTED]
     To: [EMAIL PROTECTED]
     Subject: SMTP and Mail Attachments - [informational]

     This is gonna be like a small info bulletin on how...
     ...
     .
     250 Mail accepted for delivery

   When the mail is received, the MUA separates the header from the rest
   of the body.

5. So, now we come to the final question.  Attachments.

   Attachments may be binary or text files.  We do not know, nor should we
   care.  The main question is how do we get 8 bit data across all
   networks.  Note that the Internet is a collection of heterogenous
   networks.  Most speak TCP/IP, some speak X.25, some speak even more
   obscure and outdated protocols.  Some of these protocols have a 6 bit
   character set - 64 characters that may pass through their networks.  We
   must be able to get all our data through in these 64 characters.  

   Fortunately though, these are the most used characters in emails - 
     A-Z, a-z, 0-9, +/
   Unfortunately though, we also have 192 other characters that are used,
   though not as often.

   Now, most of the networks on the Internet can actually handle 7 bit
   ASCII data, so we don't really worry about it too much.  With plain
   text that is.  When you're sending attachments, you probably want it to
   get through correctly.  We therefore need to encode our 8 bit
   attachment into 7 or 6 bits.  We figure, since we're gonna encode it
   anyway, let's go with 6 bits and cover the whole net.

   We use what is called Base64 encoding and I am not going to go into it
   here.  There are other encoding formats - quoted printable being very
   common - that only code characters that are outside the 64 character
   set.  Base 64 is the most used though.

   Now, how do we get the attachment into the mail?  Assuming that we have
   already encoded it, two things remain.

   i)  Add appropriate headers to the mail so that the MUA knows that
       there are attachments and where they can be found.
   ii) Add headers to the attachments that will allow the MUA to properly
       decode and save the attached file.

   The main mail /must/ contain the following headers:
      MIME-Version: 1.0         
      Content-Type: multipart/mixed;    
          boundary="<boundary>"

   A note about mail headers... any line that starts with white space is
   appended to the preceeding line.  Thus, boundary is actually part of
   the Content-Type.  The <boundary> is a random string of base64 allowed
   characters.

   You may also want to add a Content-Disposition: mentioning that the
   attachments are attached inline with their own headers:

     Content-Disposition: inline

   Now, each attachment /must/ start with this same boundary, preceeded by
   two hyphens --

      --<boundary>
      Attachment header

      encoded attachment

   After the final attachment, will be the boundary, preceeded and
   succeeded by two hyphens:
      --<boundary>--

   Simple so far right?

   Now, the attachment header.
   Immediately after the boundary, you enter the attachment header.  The
   only required field is the Content-Transfer-Encoding: which tells the
   MUA what was used to encode the data.
   There is also the Content-Type and the Content-Disposition
   that tell the MUA what the original mime type of the attachment was.
   This would also mention the original file name.

      Content-Transfer-Encoding: base64
      Content-Type: text/plain;
          name="<filename>"
      Content-Disposition: attachment;
          filename="<filename>"

      <Encoded attachment>

   As before, leave a blank line, and enter the attachment.

   That pretty much looks like the end, except for a small addition.  Your
   attachment itself could be a multipart attachment, in which case it
   would have a multipart/mixed mime type, and a second boundary and
   attachments under it.  So think of it as being pretty recursive.

   Looking at it now, we could probably consider our entire mailbox file
   as a single mail with each mail being an attachment and the
   UNIX_FROM_LINE being the boundary.


Notes:  MUA - Mail User Agent (Netscape, PINE, mutt etc).
        SMTP - Simple Mail Transfer Protocol.
        EHLO - Extended Helo protocol - the successor to HELO.
        MIME - Multipart Internet Mail Extensions

References: 
        RFC 821       - SMTP
        RFC 822       - Message headers
        RFC 2045-2049 - MIME

Hope ya'll found this interesting.  I'll put it up on my page soon.

Philip

-- 
Unfair animal names:

-- tsetse fly                   -- bullhead
-- booby                        -- duck-billed platypus
-- sapsucker                    -- Clarence
                -- Gary Larson


_______________________________________________
Linuxers mailing list
[EMAIL PROTECTED]
http://ilug-bom.org.in/mailman/listinfo/linuxers/listinfo/linuxers

Reply via email to