Using mu from git master @ ab5830 (0.9.9.6), I would like to filter out unwanted characters from mu's rendering of HTML email. Without resorting to running an external html to text process for each message view buffer [1] [2], is there a way for mu a) be more aggressive with its own filter, and b) filter and/or map unwanted characters to an accepted set for display?
Many but not all HTML-format emails from correspondents carry these visual artifacts. The most common occurrence is in their citation (reply, forward) header block, and footer blocks. An example, presuming email preserves hat-H and other characters: --- F^HFr^Hro^Hom^Hm:^H: First Last S^HSe^Hen^Hnt^Ht:^H: Thursday, July 03, 2014 7:46 AM T^HTo^Ho:^H: First Last C^HCc^Hc:^H: First Last S^HSu^Hub^Hbj^Hje^Hec^Hct^Ht:^H: The subject The body text of most HTML messages shows relatively cleanly in mu with stock configuration. Some messages or parts thereof are unreadable with control characters and punctuation unicode: _^HA_^Hd_^Hd_^H _^Ha_^Hl_^Hl_^H _^HG_^Ho_^Ha_^Hl_^Hs(wrap) _^H _^Ht_^Ho_^H _^HC_^Ha_^Hl_^He_^Hn_^Hd_^Ha_^Hr By clicking above, the following goals will be added to your calendar. ======================================================================== _^HI_^Hm_^Hp_^Hl_^He_^Hm_^He_^Hn_^Ht_^H _^HC_^Ho_^Hn_^Hn_^He_^Hc_^H(wrap) t_^Ho_^Hr_^H _^H-_^H _^HQ_^H3 Due Date: S^HSe^Hep^Hp,^H, 3^H30^H0 2^H20^H01^H14^H4 ======================================================================== _^HL_^Ha_^Hu_^Hn_^Hc_^Hh_^H _^HE_^Hx_^Hc_^Hh_^Ha_^Hn_^Hg_^He. Due Date: S^HSe^Hep^Hp,^H, 3^H30^H0 2^H20^H01^H14^H4 ======================================================================== _^HF_^Hi_^Hn_^Ha_^Hl_^Hi_^Hz_^He_^H _^HP_^Hl_^Ha_^Hn_^H _^Hf_^Ho_^Hr(wrap) _^H ^H _^HS_^Hi_^Ht_^He_^Hs Due Date: S^HSe^Hep^Hp,^H, 3^H30^H0 2^H20^H01^H14^H4 _^HU_^Hn_^Hs_^Hu_^Hb_^Hs_^Hc_^Hr_^Hi_^Hb_^He_^H _^Hf_^Hr_^Ho_^Hm_^H (wrap) _^Ha_^Hl_^Hl_^H _^Hc_^Ha_^Hl_^He_^Hn_^Hd_^Ha_^Hr_^H _^Hi_^Hn_^Hv_^H(wrap) i_^Ht_^He_^Hs | _^Hu_^Hp_^Hd_^Ha_^Ht_^He_^H _^Hn_^Ho_^Ht_^Hi_^Hf_^H(wrap) i_^Hc_^Ha_^Ht_^Hi_^Ho_^Hn_^H _^Hp_^Hr_^He_^Hf_^He_^Hr_^He_^Hn_^Hc_^He_^Hs Footer block punctuation unicode characters: T: +1 123 456 78900\302^H^H\240\302^H^H\240\302^H^H\240M: +1 123 456 67890 --- Thanks, Jeff [1] http://www.djcbsoftware.nl/code/mu/mu4e/Displaying-rich_002dtext-messages.html [2] Given sufficient time to configure a solution, it might be nice to run html2text over my maildir to append a text version to HTML-only messages, leaving the HTML original intact. Has anyone described a method for doing this non-destructively, i.e. conservative with good bailout on poorly-formed email? -- You received this message because you are subscribed to the Google Groups "mu-discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
