Phil Pennock <[email protected]> (Mo 13 Jul 2009 23:44:15 CEST): > On 2009-07-13 at 22:54 +0200, Karl Fischer wrote: > > I followed this thread with interest and I'm still a little puzzled with the > > specific exim syntax, but in terms of regex and just extracting the header > > names, this perl regex should be more efficient: s/:.*?\n(\s+.*?\n)*/:/g > > > > This saves looping through map/extract by getting rid of the unwanted 1st. > > Good point. > > However, you're also not stripping out space between the header name and > the following colon, which is valid. This email could validly be > constructed with: > ----------------------------8< cut here >8------------------------------ > From: Phil .... > To : Karl ... > Cc : exim-users .... > ----------------------------8< cut here >8------------------------------
Ah. Thanks for the hint.
> With a little further optimisation, we get:
>
> s/(?>\s*:.*?\n)(?>\s+.*?\n)*/:/g
>
> although actually I'm not sure there would be any backtracking needed
> for your s///g and it's probably only the \s*: that benefits from the
> protection. (I can't be bothered to benchmark it).
>
> > In exim syntax I'd assume this to be (not tested yet):
> >
> > MESSAGE_HEADERS = ${lc:${sg
> > {$message_headers_raw}{\N:.*?\n(\s+.*?\n)*\N}{:}}}
>
> ${lc:${sg{$message_headers_raw}{\N(?>\s*:.*?\n)(?>\s+.*?\n)*\N}{:}}}
I'm still at my version - instead of cutting away the tail, I'm
selecting the head of the logical header line:
${lc:${sg {$message_headers_raw}{\N(?m)(^\S+(?=\s*):)?.*?\n\N}{\$1}}}
But I'm not sure about efficency or readability.
--
Heiko
signature.asc
Description: Digital signature
-- ## List details at http://lists.exim.org/mailman/listinfo/exim-users ## Exim details at http://www.exim.org/ ## Please use the Wiki with this list - http://wiki.exim.org/
