On Fri, 26 Nov 2021 at 14:58, Daniel Gruno <[email protected]> wrote:
>
> On 26/11/2021 15.47, sebb wrote:
> > On Fri, 26 Nov 2021 at 14:37, Daniel Gruno <[email protected]> wrote:
> >>
> >> On 26/11/2021 15.21, sebb wrote:
> >>> That does not work for headers with multiple values
> >>
> >> The four headers in question (to, from, subject, message-id) should
> >> never have multiple values, right?
> >
> > To and From (as long as Sender is provided) can both have multiple values.
> >
> > As can other headers (e.g. irt) that are stored in the mbox index.
>
> Right, what I should say is, they don't have multiple values in a way
> that matters to our database. Trimming the excess whitespace is therefor
> fine the way it currently is.

That remains to be seen; I am not convinced.

> For headers where we may care about multiple values (IRT and
> References), we need to figure out how to split those properly.

> ElasticSearch doesn't care if a value is a single string or an array of
> strings, so the mappings and search should allow for an updated version
> of those two fields without any changes to the db.

Surely that depends on the mapping type and the sort of searches?

> >
> > Indeed even single-valued headers can have embedded new-lines, so
> > fixing the ends of the string is not sufficient.
> >
> > Removing line-wraps is tricky to do correctly, so the code should use
> > the appropriate email methods to improve compatibility with the RFCs.
> >
> >>>
> >>> On Fri, 26 Nov 2021 at 13:50, <[email protected]> wrote:
> >>>>
> >>>> This is an automated email from the ASF dual-hosted git repository.
> >>>>
> >>>> humbedooh pushed a commit to branch master
> >>>> in repository 
> >>>> https://gitbox.apache.org/repos/asf/incubator-ponymail-foal.git
> >>>>
> >>>>
> >>>> The following commit(s) were added to refs/heads/master by this push:
> >>>>        new 146f15c  Strip superfluous whitespace from vital headers
> >>>> 146f15c is described below
> >>>>
> >>>> commit 146f15cc5a97d741bcfcd5a6584f82a49490d053
> >>>> Author: Daniel Gruno <[email protected]>
> >>>> AuthorDate: Fri Nov 26 14:50:20 2021 +0100
> >>>>
> >>>>       Strip superfluous whitespace from vital headers
> >>>> ---
> >>>>    tools/archiver.py | 2 +-
> >>>>    1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/tools/archiver.py b/tools/archiver.py
> >>>> index 8256fac..b3859ea 100755
> >>>> --- a/tools/archiver.py
> >>>> +++ b/tools/archiver.py
> >>>> @@ -475,7 +475,7 @@ class Archiver(object):  # N.B. Also used by 
> >>>> import-mbox.py
> >>>>                                )
> >>>>                            else:
> >>>>                                hval += t[0].decode(t[1], errors="ignore")
> >>>> -                    msg_metadata[key] = hval
> >>>> +                    msg_metadata[key] = hval.strip()
> >>>>                except Exception as err:
> >>>>                    print("Could not decode headers, ignoring..: %s" % 
> >>>> err)
> >>>>            message_date = None
> >>
>

Reply via email to