Matthew, et al --

...and then Matthew D. Fuller said...
% 
% On Wed, Dec 26, 2001 at 09:22:33PM -0500 I heard the voice of
% David T-G, and lo! it spake thus:
% > 
% > Thus, it should be sufficient to match on any ^From_ line as long as
% > you're working with an mbox file (which you can confirm by checking the
...
% 
% Note that this can (also) break.

So I hear!


% 
% I was just testing some mbox-parsing code the other day, and I needed a
% quick mbox of reasonable size to test it against.  Hey, how about
% ~/mail/sent?

One would think so...


% 
% But it's got bare "^From " lines  in mid-message where they 'naturally'
% appeared.  So, either you need a bit more smarts than just "^From ", or
% mutt doesn't write 'sent' as a true mbox.

And I trust that this all works when you open it with mutt, right?  [Hey,
it never hurts to check.]


% 
% The 'mbox' manpage from qmail says:
% ---
% MESSAGE FORMAT
%      A message encoded in mbox format begins with a  From_  line,
%      continues  with a series of non-From_ lines, and ends with a
%      blank line.  A From_ line means any line  that  begins  with
%      the characters F, r, o, m, space:
% 
%      [...]
% ---
% 
% Which seems to imply the POV that "^From " should be a sufficient pattern
% (in which case, watch out for your sent box!)

Yes, indeed.


% 
% Mutt seems to use a bit more smarts.  See "is_from()" in from.c for
% details.

At the very least, Philip now has a more solid regexp definition:

  From [ <return-path> ] <weekday> <month> <day> <time> [ <timezone> ] <year>

would probably turn into something like

  ^From ([^\t\s@][^\t\s@]*@[^\t\s@][^\t\s@]*\.[^\t\s@][^\t\s@]*|)  \
    (Sun|Mon|Tue|Wed|Thu|Fri|Sat) \
    (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \
    [\s1-3][0-9] [01][0-9]:[0-5][0-9]:[0-5][0-9] \
    ([A-Z][A-Z][A-Z] |) [0-9][0-9][0-9][0-9]

(yes, I've faked it with line breaks just to keep things readable; note
the two spaces at the end of the first line although it may not really
matter and [\s]* should perhaps be used instead).  No, I'm not going into
MIME-encoding of the header as seen in some ^From: lines.  No, this
doesn't allow for leap seconds (but *probably* all one needs is to add a
6 to the seconds regexp).  No, this will break at year 10000; apparently
y2k taught me nothing :-)


% 
% -- 
% Matthew Fuller     (MF4839)     |    [EMAIL PROTECTED]
% Unix Systems Administrator      |    [EMAIL PROTECTED]
% Specializing in FreeBSD         |    http://www.over-yonder.net/
% 
% "The only reason I'm burning my candle at both ends, is because I
%       haven't figured out how to light the middle yet"

HTH & HAND & Happy Holidays to all


:-D
-- 
David T-G                      * It's easier to fight for one's principles
(play) [EMAIL PROTECTED] * than to live up to them. -- fortune cookie
(work) [EMAIL PROTECTED]
http://www.justpickone.org/davidtg/    Shpx gur Pbzzhavpngvbaf Qrprapl Npg!

Attachment: msg21919/pgp00000.pgp
Description: PGP signature

Reply via email to