Package: html2text
Serverity: wishlist

Hello Maintainer,

On the mutt mailinglist there is a short conversation about the Outlook
HTML problem which concern html4text too which lead to a double spacing

Thanks, Greetings and nice Day/Evening
    Michelle Konzack
    Systemadministrator


----- Forwarded message from Aaron Toponce <[email protected]> -----

Date: Wed, 4 May 2011 22:07:41 -0600
From: Aaron Toponce <[email protected]>
To: [email protected]
Subject: Re: Dealing with Outlook 2007 HTML messages
User-Agent: Mutt/1.5.21 (2010-09-15)

On Wed, May 04, 2011 at 06:42:12PM -0600, Aaron Toponce wrote:
> On Wed, May 04, 2011 at 02:07:37PM -0700, Ray Van Dolson wrote:
> > Given Outlook 2007's penchant for using <p> tags instead of <br> tags
> > when a user hits enter (ref[1]), I'm curious how some of you deal with
> > this short of viewing only the text/plain portion of the email.
> >
> > I use w3m as my viewer and have tried passing the Outlook HTML through
> > "tidy" first.  It cleans up the HTML a bit, but leave the <p> tags
> > intact resulting in unsightly and unnecessary double spacing
> > throughout.
> >
> > Anyone written an Outlook 2007 aware sanitizer that cleans things up?
> > Other ideas?
> >
> > [1] 
> > http://blog.egriffin.net/2008/05/eliminating-extra-line-spacing-in-ms.html
>
> I haven't been particularly bothered enough by it. In fact, I see it so
> rarely, I had to send a lorem ipsum mail to myself from my work account,
> just to see the problem.
>
> I'm probably like most, and pipe all my HTML mail through elinks, and it
> seems to do fine (minus the annoying mails from retailers who use images to
> convey their message, and horrible table-like formatting). It doesn't
> address your issue though.
>
> So, if a solution is found, and the hack is simple enough to implement, I'm
> interested. Please post your solution to the list if you find one.

So, I've been bothered by this now, thanks to you, so I've been looking at
solutions. So far, I have ideas, but nothing implemented. I've also learned
why the extra line breaks exist.

If you look at the HTML of the message sent from an Outlook 2007 client,
you'll notice the paragrphs are wrapped in tags like this:

    <p>Lorem ipsum<o:p></o:p></p>

Notice the extra <o:p></o:p> tags in the middle of your paragraph tags. I
believe those are being handled as standard <p> tags, and thus giving you
the extra line break. If you press Shift+Enter in Outlook 2007, rather than
just <Enter> for your line breaks, you can avoid this issue.

Of course, that doesn't help those who are sending you the mail. Thus, here
are some ideas for filtering those <o:p> tags out:

    * Use something like fdm(1) to fetch your mail, then it's trivial to
      filter those tags out.
    * Parse the HTML with sed(1), removing the tags before sending it to an
      HTML renderer.

Thus, the original message remains in tact. If there is an HTML renderer
that the message can be piped to, that has strict standards compliance, or
can specifically ignore Microsoft Office namespace tags, such as <o:p>,
then that would be the prime option, I think. I don't know if such actually
exists though.

Again, all this rests on the assumption that the <o:p> tags are the
culprit. Anyway, hope that helps a little.

--
. o .   o . o   . . o   o . .   . o .
. . o   . o o   o . o   . o o   . . o
o o o   . o .   . o o   o o .   o o o



----- End forwarded message -----




-- 
Linux-User #280138 with the Linux Counter, http://counter.li.org/
##################### Debian GNU/Linux Consultant #####################
Michelle Konzack                Apt. 917            
Mobile FR: +33/6/61925193       50, rue de Soultz   
Mobile DE: +49/177/9351947      67100 Strasbourg    
Office DE: +49/176/86004575     France              

Attachment: signature.pgp
Description: Digital signature

Reply via email to