[Mailman-Users] How to wrap text in archived messages

2022-05-23 Thread Stephen J. Turnbull
Mark Dale via Mailman-Users writes:

 > I'm looking for a way to wrap lines in archived messages.

Executive summary: There's not really a good way to do this.  It's
extremely complicated, *especially* in email (as opposed to most
"normal" text) because of quoting conventions in email.

 > With zero understanding of Python my attempts to implement this
 > have failed so far and I may well be barking up the wrong tree
 > completely. Any clues or pointers gratefully received.

It's not your lack of Python, it's that reliably reformatting email
for different formats of email is a *very* hard problem in natural
language processing, and requires some knowledge of message user agent
internals.  And that's why Pipermail punts by just wrapping the whole
thing in a PRE element.  Works for Mutt users (= Unix email elders).

Gory details follow (because I think it's an interesting problem!)

 > Looking at the HTML page source -- in both cases (wrapped and
 > unwrapped) I see the message content is enclosed by PRE tags.

Right.  PRE is not very pretty as HTML goes, but it works OK for all
RFC-conforming text/plain email.  I assume that that in fact this
comes from text/plain parts created by the author's MUA, because the
agents that we use to transform a text/html part to text/plain will
format to a reasonable width such as 72 characters.

 > And the lines in that block that seem responsible for the PRE tags are ...
 > 
 > lines.insert(0, '')
 > lines.append('')
 > 
 > My question is: Can those PRE tags be removed and replaced with
 > something equivalent to PHP's "nl2br" (which inserts a line break
 > BR in place of new line entries)?

No, because there *are no* newlines to break those very long lines.
These MUAs use newline to mean "paragraph break", not "line break".

You might get a better result in these messages by removing the "PRE"
tags, and wrapping each line with "...", but that's a real
hack, and almost certain to make RFC-conforming email look quite ugly,
because every line becomes a paragraph, and you'll lose all
indentation.  Eg, in the code blocks you posted, all the lines will
end up flush left.  If your members are posting code or poetry, or
using indented block quotations etc, they're likely to be extremely
unhappy with the result.

Python's standard library does have a textwrap module, but I'm not at
all sure it's suitable for this.  If you know that the long lines of a
message are actually paragraphs, you can use something like

from textwrap import wrap
# work backward because wrapping changes indicies of later lines
for i in range(len(lines) - 1, -1, -1):
# NDT = detect_prefix(lines[i])
lines[i:i+1] = wrap(lines[i], initial_indent=NDT, subsequent_indent=NDT)

If a line is indented or has a quoting prefix, you have to detect that
for yourself and set NDT to that prefix.  Something like

import re
prefix_re = re.compile('[ >]*')
def detect_prefix(line):
m = prefix_re.match(line)
return m.group(0)

should capture most indentation and quoting prefixes, but there are
other conventions.

Whether you use P elements or the textwrap module, it's probably a
good idea to find out how long the long lines are, and what percentage
of the message they are, and avoid trying to wrap a message that looks
like it "mostly" has lines of reasonable length.  If you don't, and
your target is the old "typewriter standard" width of 66, and somebody
using an RFC-conforming MUA just prefers 72, you'll reformat their
mail into alternating lines of about 60 characters and 10 characters.
Yuck ...

Which of the above would work better for you depends a lot on the
typical content of your list.  But issues with quoting and indentation
are likely to have you tearing your hair out.

Steve
--
Mailman-Users mailing list -- mailman-users@python.org
To unsubscribe send an email to mailman-users-le...@python.org
https://mail.python.org/mailman3/lists/mailman-users.python.org/
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: https://www.mail-archive.com/mailman-users@python.org/
https://mail.python.org/archives/list/mailman-users@python.org/


[Mailman-Users] How to wrap text in archived messages

2022-05-22 Thread Mark Dale via Mailman-Users
Hi,

I'm looking for a way to wrap lines in archived messages.

Messages from some mail clients (eg. Gmail) have their lines wrapped to 72 
chars in the archived version, while archived messages from others (eg. 
Thunderbird, Outlook) display unwrapped lines forcing the reader to scroll 
horizontally.

Looking at the HTML page source -- in both cases (wrapped and unwrapped) I see 
the message content is enclosed by PRE tags. 




Lorem ipsum dolor sit amet, consectetur ...

Ut enim ad minim veniam, quis nostrud ...


  



The template (article.html) contains the following:
...


%(body)s


...

>From what I can figure out, the PRE tags come from
>.../mailman/Mailman/Archiver/HyperArch.py in a block of code lines 1290 to 
>1314 ...

///

def format_article(self, article):
# called from add_article
# TBD: Why do the HTML formatting here and keep it in the
# pipermail database?  It makes more sense to do the html
# formatting as the article is being written as html and toss
# the data after it has been written to the archive file.
lines = filter(None, article.body)
# Handle   directives
if self.ALLOWHTML:
self.__processbody_HTML(lines)
self.__processbody_URLquote(lines)
if not self.SHOWHTML and lines:
lines.insert(0, '')
lines.append('')
else:
# Do fancy formatting here
if self.SHOWBR:
lines = map(lambda x:x + "", lines)
else:
for i in range(0, len(lines)):
s = lines[i]
if s[0:1] in ' \t\n':
lines[i] = '' + s
article.html_body = lines
return article




And the lines in that block that seem responsible for the PRE tags are ...

lines.insert(0, '')
lines.append('')

My question is: Can those PRE tags be removed and replaced with something 
equivalent to PHP's "nl2br" (which inserts a line break BR in place of new line 
entries)?

A Google search for such an equivalent gives me ...

def nl2br(s):
return '\n'.join(s.split('\n'))

With zero understanding of Python my attempts to implement this have failed so 
far and I may well be barking up the wrong tree completely. Any clues or 
pointers gratefully received.

Thanks.
--
Mailman-Users mailing list -- mailman-users@python.org
To unsubscribe send an email to mailman-users-le...@python.org
https://mail.python.org/mailman3/lists/mailman-users.python.org/
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: https://www.mail-archive.com/mailman-users@python.org/
https://mail.python.org/archives/list/mailman-users@python.org/