On 24/12/2020 22:17, Yves Goergen via Exim-users wrote:
I'm parsing Exim log files, specifically the mainlog. Man, that's a complex
structure and it's hard to find all necessary details from the documentation
and by reading my actual log files. I'm using several regular expressions for
different kinds of lines. But a stateful parser (the ones used to understand
programming languages) would probably have been the better choice here. Apache
access logs just require a single regex, for Exim I already have 8, one of
which just covers most meaningless messages I don't care about, and lots of
detailed post-processing.
The logs are really designed for human use, not for machine consumption.
What assumptions can I make about the format of a queue message ID? For now, I
use this regex:
[^ ]+
Though it seems they always match this regex:
[0-9A-Za-z]{6}-[0-9A-Za-z]{6}-[0-9A-Za-z]{2}
It may change at any time from future development changes.
There's a relevant comment in the source:
/* Now build the unique message id. This has changed several times over the
lifetime of Exim. This description was rewritten for Exim 4.14 (February 2003).
...
I *think* that some high-volume sites are at or close to performance limits [1]
that the current format imposes, hence I must reiterate: this (the message_id
format) is not supposed to be an exported interface. It's only documented
behaviour is that it is unique.
It's fairly reasonable to assume it'll never have an embedded space. I would
not
recommend trying to extract meaning from it.
--
Cheers,
Jeremy
1) Unfortunately, nobody ever gives any real feedback to developers.
--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/