On Fri, 26 Nov 2021 at 14:58, Daniel Gruno <[email protected]> wrote: > > On 26/11/2021 15.47, sebb wrote: > > On Fri, 26 Nov 2021 at 14:37, Daniel Gruno <[email protected]> wrote: > >> > >> On 26/11/2021 15.21, sebb wrote: > >>> That does not work for headers with multiple values > >> > >> The four headers in question (to, from, subject, message-id) should > >> never have multiple values, right? > > > > To and From (as long as Sender is provided) can both have multiple values. > > > > As can other headers (e.g. irt) that are stored in the mbox index. > > Right, what I should say is, they don't have multiple values in a way > that matters to our database. Trimming the excess whitespace is therefor > fine the way it currently is.
That remains to be seen; I am not convinced. > For headers where we may care about multiple values (IRT and > References), we need to figure out how to split those properly. > ElasticSearch doesn't care if a value is a single string or an array of > strings, so the mappings and search should allow for an updated version > of those two fields without any changes to the db. Surely that depends on the mapping type and the sort of searches? > > > > Indeed even single-valued headers can have embedded new-lines, so > > fixing the ends of the string is not sufficient. > > > > Removing line-wraps is tricky to do correctly, so the code should use > > the appropriate email methods to improve compatibility with the RFCs. > > > >>> > >>> On Fri, 26 Nov 2021 at 13:50, <[email protected]> wrote: > >>>> > >>>> This is an automated email from the ASF dual-hosted git repository. > >>>> > >>>> humbedooh pushed a commit to branch master > >>>> in repository > >>>> https://gitbox.apache.org/repos/asf/incubator-ponymail-foal.git > >>>> > >>>> > >>>> The following commit(s) were added to refs/heads/master by this push: > >>>> new 146f15c Strip superfluous whitespace from vital headers > >>>> 146f15c is described below > >>>> > >>>> commit 146f15cc5a97d741bcfcd5a6584f82a49490d053 > >>>> Author: Daniel Gruno <[email protected]> > >>>> AuthorDate: Fri Nov 26 14:50:20 2021 +0100 > >>>> > >>>> Strip superfluous whitespace from vital headers > >>>> --- > >>>> tools/archiver.py | 2 +- > >>>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>>> > >>>> diff --git a/tools/archiver.py b/tools/archiver.py > >>>> index 8256fac..b3859ea 100755 > >>>> --- a/tools/archiver.py > >>>> +++ b/tools/archiver.py > >>>> @@ -475,7 +475,7 @@ class Archiver(object): # N.B. Also used by > >>>> import-mbox.py > >>>> ) > >>>> else: > >>>> hval += t[0].decode(t[1], errors="ignore") > >>>> - msg_metadata[key] = hval > >>>> + msg_metadata[key] = hval.strip() > >>>> except Exception as err: > >>>> print("Could not decode headers, ignoring..: %s" % > >>>> err) > >>>> message_date = None > >> >
