>Perhaps best to find "image/jpeg" and then find the blank line before 
>the
>start of the base64 with "^-^-" ?

This would suit my purposes fine, but it would not do for a universal 
function.

>Can we find the end of the base64 by looking for the blank line after?

It seems the blank line before and the blank line after are pretty 
standard. I'm not sure about meeting the MIME standard.

>I suppose, we would like to get the image file name as well?

Not necessary for me, but again a universal function should satisfy all 
possibilities.

The following comes from rfc1521 at 

http://www.cis.ohio-state.edu/htbin/rfc/rfc1521.html

The reference is confusing, because it first says that line breaks 
cannot be relied upon, but then the example it gives of a multi-part 
message always shows line breaks before each new base64 encoding.

Perhaps someone can come up with a good parsing scheme based on the 
following?
--------------------------

"The output stream (encoded bytes) must be represented in lines of no 
more than 76 characters each... Special processing is performed if 
fewer than 24 bits are available at the end of the data being encoded.  
A full encoding quantum is always completed at the end of a body.  When 
fewer than 24 input bits are available in an input group, zero bits are 
added (on the right) to form an integral number of 6-bit groups.  
Padding at the end of the data is performed using the '=' character.  
Since all base64 input is an integral number of octets, only the 
following cases can arise: (1) the final quantum of encoding input is 
an integral multiple of 24 bits; here, the final unit of encoded output 
will be an integral multiple of 4 characters with no "=" padding, (2) 
the final quantum of encoding input is exactly 8 bits; here, the final 
unit of encoded output will be two characters followed by two "=" 
padding characters, or (3) the final quantum of encoding input is 
exactly 16 bits; here, the final unit of encoded output will be three 
characters followed by one "=" padding character.

"Because it is used only for padding at the end of the data, the 
occurrence of any '=' characters may be taken as evidence that the end 
of the data has been reached (without truncation in transit). No such 
assurance is possible, however, when the number of octets transmitted 
was a multiple of three.

"The following guidelines may be useful to anyone devising a data 
format (Content-Type) that will survive the widest range of networking 
technologies and known broken MTAs unscathed: 

"(1) Under some circumstances the encoding used for data may change as 
part of normal gateway or user agent operation. In particular, 
conversion from base64 to quoted-printable and vice versa may be 
necessary. This may result in the confusion of CRLF sequences with line 
breaks in text bodies. As such, the persistence of CRLF as something 
other than a line break must not be relied on.

"(2) Many systems may elect to represent and store text data using 
local newline conventions. Local newline conventions may not match the 
RFC822 CRLF convention -- systems are known that use plain CR, plain 
LF, CRLF, or counted records.  The result is that isolated CR and LF 
characters are not well tolerated in general; they may be lost or 
converted to delimiters on some systems, and hence must not be relied 
on.  

What follows is the outline of a complex multipart message.  This 
message has five parts to be displayed serially: two introductory plain 
text parts, an embedded multipart message, a richtext part, and a 
closing encapsulated text message in a non-ASCII character set.

The embedded multipart message has two parts to be displayed in 
parallel, a picture and an audio fragment. 

      MIME-Version: 1.0 
      From: Nathaniel Borenstein <[EMAIL PROTECTED]> 
      To: Ned Freed <[EMAIL PROTECTED]> 
      Subject: A multipart example 
      Content-Type: multipart/mixed; 
           boundary=unique-boundary-1 

      This is the preamble area of a multipart message. 
      Mail readers that understand multipart format 
      should ignore this preamble. 
      If you are reading this text, you might want to 
      consider changing to a mail reader that understands 
      how to properly display multipart messages. 
      --unique-boundary-1 

         ...Some text appears here... 
      [Note that the preceding blank line means 
      no header fields were given and this is text, 
      with charset US ASCII.  It could have been 
      done with explicit typing as in the next part.] 

      --unique-boundary-1 
      Content-type: text/plain; charset=US-ASCII 

      This could have been part of the previous part, 
      but illustrates explicit versus implicit 
      typing of body parts. 

      --unique-boundary-1 
      Content-Type: multipart/parallel; 
           boundary=unique-boundary-2 


      --unique-boundary-2 
      Content-Type: audio/basic 
      Content-Transfer-Encoding: base64 

         ... base64-encoded 8000 Hz single-channel 
             mu-law-format audio data goes here.... 


      --unique-boundary-2 
      Content-Type: image/gif 
      Content-Transfer-Encoding: base64 

         ... base64-encoded image data goes here.... 

      --unique-boundary-2-- 

      --unique-boundary-1 
      Content-type: text/richtext 

      This is <bold><italic>richtext.</italic></bold> 
      <smaller>as defined in RFC 1341</smaller> 
      <nl><nl>Isn't it 
      <bigger><bigger>cool?</bigger></bigger> 

      --unique-boundary-1 
      Content-Type: message/rfc822 

      From: (mailbox in US-ASCII) 
      To: (address in US-ASCII) 
      Subject: (subject in US-ASCII) 
      Content-Type: Text/plain; charset=ISO-8859-1 
      Content-Transfer-Encoding: Quoted-printable 

         ... Additional text in ISO-8859-1 goes here ... 

      --unique-boundary-1--

Reply via email to