On 26/11/2021 15.47, sebb wrote:
On Fri, 26 Nov 2021 at 14:37, Daniel Gruno <[email protected]> wrote:

On 26/11/2021 15.21, sebb wrote:
That does not work for headers with multiple values

The four headers in question (to, from, subject, message-id) should
never have multiple values, right?

To and From (as long as Sender is provided) can both have multiple values.

As can other headers (e.g. irt) that are stored in the mbox index.

Right, what I should say is, they don't have multiple values in a way that matters to our database. Trimming the excess whitespace is therefor fine the way it currently is.

For headers where we may care about multiple values (IRT and References), we need to figure out how to split those properly. ElasticSearch doesn't care if a value is a single string or an array of strings, so the mappings and search should allow for an updated version of those two fields without any changes to the db.


Indeed even single-valued headers can have embedded new-lines, so
fixing the ends of the string is not sufficient.

Removing line-wraps is tricky to do correctly, so the code should use
the appropriate email methods to improve compatibility with the RFCs.


On Fri, 26 Nov 2021 at 13:50, <[email protected]> wrote:

This is an automated email from the ASF dual-hosted git repository.

humbedooh pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-ponymail-foal.git


The following commit(s) were added to refs/heads/master by this push:
       new 146f15c  Strip superfluous whitespace from vital headers
146f15c is described below

commit 146f15cc5a97d741bcfcd5a6584f82a49490d053
Author: Daniel Gruno <[email protected]>
AuthorDate: Fri Nov 26 14:50:20 2021 +0100

      Strip superfluous whitespace from vital headers
---
   tools/archiver.py | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/archiver.py b/tools/archiver.py
index 8256fac..b3859ea 100755
--- a/tools/archiver.py
+++ b/tools/archiver.py
@@ -475,7 +475,7 @@ class Archiver(object):  # N.B. Also used by import-mbox.py
                               )
                           else:
                               hval += t[0].decode(t[1], errors="ignore")
-                    msg_metadata[key] = hval
+                    msg_metadata[key] = hval.strip()
               except Exception as err:
                   print("Could not decode headers, ignoring..: %s" % err)
           message_date = None


Reply via email to