On Sun, 12 Nov 2000, Earl Hood wrote:

> On November 11, 2000 at 13:20, anthonyw wrote:
> 
> >  For the msg*.html files that are derived from multipart messages:
> > 
> > 1) Save the boundary markers of the message part 
> 
> This is really not needed since the script can recreate new
> boundaries.
> 
> > 2) Retain the Mime headers of the message parts - those which are not
> >    displayed inline - inside of some X- comment.
> 
> The only header of potential interest is the Content-Type header.
> Other headers can be recreated by the script.  An external file's
> content-type can be implied from the filename extension or by the use
> of file(1) and /etc/magic, or similiar mechanism.
> 
> Note, any back conversion will never be perfect, so it is best to
> minimize the amount of work to whatever can give a passable solution.
> I'd prefer to not over-pollute a page with a bunch of comment
> declarations since it increases the byte size of the page with
> questionable benefit.  Also, the comment declaration approach will not
> work well with MHTML messages since some parts may be decoded but
> referenced within the main HTML part.
> 
> Plus, since there appears to be a desire to back convert messages
> from existing archives, a solution must exist that does not rely
> on comment declarations that do not exist.
> 
> BTW, the list of external files is given at the top of each message
> page so the bodies URLs can be scanned to see which ones match
> against the list.  One could probably just have all message
> denoted as multipart in the main content-type comment declaration
> converted into MHTML messages.  All basic content-types should be
> translatable back to the something close to the original message
> (wrt to the message body).
> 

I was just being lazy about parsing :-). The current scripts has a section
which reads: 

if ($isinbody =~ /true/ )
{

# Extract URLs
 s/\<a\ href\=\"(.*)"\>(.*)\<\/a\>/\2/g;

}

and I was too lazy to think of ways to see if the extracted url referred
to a derived file. I wanted to do the following (which is lots more work):

  test if comment says bodypart-begin
  set isinbodypart=true
  skip this line
  if isinbodypart=true
  then
    extract the url knowing that it refers to a local file
    dosomething ...
  endif

Now that I have looked at this some more I will go the following route:

if ($isinbody =~ /true/ )
{
  if (/<a\ href\=\"/)
  {
    # Extract URLs
    s/\<a\ href\=\"(.*)"\>(.*)\<\/a\>/\2/g;
    $url =~ s/\<a\ href\=\"(.*)"\>(.*)\<\/a\>/\2/g;
    
    # and for each element in some "xderived" array,
    # comapare $url with that element. If there is a match
    # then build the content/type info etc. Otherwise push(@body,$_);

  }
}   
      

> --ewh
> 
> 

Regards, 

AnthonyW

Reply via email to