On Sun, 12 Nov 2000, Earl Hood wrote:
> On November 11, 2000 at 13:20, anthonyw wrote:
>
> > For the msg*.html files that are derived from multipart messages:
> >
> > 1) Save the boundary markers of the message part
>
> This is really not needed since the script can recreate new
> boundaries.
>
> > 2) Retain the Mime headers of the message parts - those which are not
> > displayed inline - inside of some X- comment.
>
> The only header of potential interest is the Content-Type header.
> Other headers can be recreated by the script. An external file's
> content-type can be implied from the filename extension or by the use
> of file(1) and /etc/magic, or similiar mechanism.
>
> Note, any back conversion will never be perfect, so it is best to
> minimize the amount of work to whatever can give a passable solution.
> I'd prefer to not over-pollute a page with a bunch of comment
> declarations since it increases the byte size of the page with
> questionable benefit. Also, the comment declaration approach will not
> work well with MHTML messages since some parts may be decoded but
> referenced within the main HTML part.
>
> Plus, since there appears to be a desire to back convert messages
> from existing archives, a solution must exist that does not rely
> on comment declarations that do not exist.
>
> BTW, the list of external files is given at the top of each message
> page so the bodies URLs can be scanned to see which ones match
> against the list. One could probably just have all message
> denoted as multipart in the main content-type comment declaration
> converted into MHTML messages. All basic content-types should be
> translatable back to the something close to the original message
> (wrt to the message body).
>
I was just being lazy about parsing :-). The current scripts has a section
which reads:
if ($isinbody =~ /true/ )
{
# Extract URLs
s/\<a\ href\=\"(.*)"\>(.*)\<\/a\>/\2/g;
}
and I was too lazy to think of ways to see if the extracted url referred
to a derived file. I wanted to do the following (which is lots more work):
test if comment says bodypart-begin
set isinbodypart=true
skip this line
if isinbodypart=true
then
extract the url knowing that it refers to a local file
dosomething ...
endif
Now that I have looked at this some more I will go the following route:
if ($isinbody =~ /true/ )
{
if (/<a\ href\=\"/)
{
# Extract URLs
s/\<a\ href\=\"(.*)"\>(.*)\<\/a\>/\2/g;
$url =~ s/\<a\ href\=\"(.*)"\>(.*)\<\/a\>/\2/g;
# and for each element in some "xderived" array,
# comapare $url with that element. If there is a match
# then build the content/type info etc. Otherwise push(@body,$_);
}
}
> --ewh
>
>
Regards,
AnthonyW