Jeff King <[email protected]> writes:
> It looks like we have a reasonably sane is_from_line() function. So at
> least _we_ will not generally break on reading our own output, except in
> some extreme circumstances (you'd have to come up with something
> contrived like "From me, at 10:30 30 minutes before 11!").
>
> We can pretty easily reuse this to make the from-line check in mailinfo
> more robust. Here's a replacement for the patch I sent earlier that
> keeps the "magically ignore extra >From headers" behavior but fixes the
> case that started this discussion.
Why cache.h when this is still only between mail{info,split}.c both
of which do not really deal with any "Git" data?
For mailsplit, we are trying to detect mbox boundary various MUAs
would use in their output, and is_from_line() may be appropriate,
but I am not sure if the same logic is appropriate for mailinfo.
What are we trying to protect us against? Those who paste a single
e-mail output from format-patch in full? Do people paste a single
e-mail received to their mailbox from somebody else and do we need
to protect against that, skipping the ">From " thing, while risking
to skip "From me at 10:30 30 minutes..."?
If we only want to skip ">?From" in pasted format-patch output, we
would want a rule in mailinfo that is tighter than is_from_line() in
mailsplit.
By the way, I see ">From " in is_rfc2822_header(), too. Do we have
to worry about this comparison as well?
> -- >8 --
> Subject: mailinfo: make ">From" in-body header check more robust
>
> Since commit 81c5cf7 (mailinfo: skip bogus UNIX From line
> inside body, 2006-05-21), we have treated lines like ">From"
> in the body as headers. This makes "git am" work for people
> who erroneously paste the whole mbox entry:
>
> From 12345abcd...
> From: them
> Subject: [PATCH] whatever
>
> into their email body (assuming that an mbox writer then
> quotes "From" as ">From", as otherwise we would actually
> mailsplit on the in-body line).
>
> However, this has false positives if somebody actually has a
> commit body that starts with "From "; in this case we
> erroneously remove the line entirely from the commit
> message. We can make this check more robust by making sure
> the line actually looks like a real mbox "From" line.
>
> To do this we pull the "is_from_line" definition out of
> mailsplit, and make it available for general use.
>
> Signed-off-by: Jeff King <[email protected]>
> ---
> Makefile | 1 +
> builtin/mailinfo.c | 2 +-
> builtin/mailsplit.c | 31 -------------------------------
> cache.h | 6 ++++++
> mbox.c | 32 ++++++++++++++++++++++++++++++++
> t/t5100-mailinfo.sh | 18 ++++++++++++++++++
> t/t5100/embed-from.expect | 5 +++++
> t/t5100/embed-from.in | 13 +++++++++++++
> t/t5100/quoted-from.expect | 3 +++
> t/t5100/quoted-from.in | 10 ++++++++++
> 10 files changed, 89 insertions(+), 32 deletions(-)
> create mode 100644 mbox.c
> create mode 100644 t/t5100/embed-from.expect
> create mode 100644 t/t5100/embed-from.in
> create mode 100644 t/t5100/quoted-from.expect
> create mode 100644 t/t5100/quoted-from.in
>
> diff --git a/Makefile b/Makefile
> index e0f15a3..e018450 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -704,6 +704,7 @@ LIB_OBJS += lockfile.o
> LIB_OBJS += log-tree.o
> LIB_OBJS += mailmap.o
> LIB_OBJS += match-trees.o
> +LIB_OBJS += mbox.o
> LIB_OBJS += merge.o
> LIB_OBJS += merge-blobs.o
> LIB_OBJS += merge-recursive.o
> diff --git a/builtin/mailinfo.c b/builtin/mailinfo.c
> index cf11c8d..a434d39 100644
> --- a/builtin/mailinfo.c
> +++ b/builtin/mailinfo.c
> @@ -329,7 +329,7 @@ static int check_header(const struct strbuf *line,
>
> /* for inbody stuff */
> if (starts_with(line->buf, ">From") && isspace(line->buf[5])) {
> - ret = 1; /* Should this return 0? */
> + ret = is_from_line(line->buf + 1, line->len - 1);
> goto check_header_out;
> }
> if (starts_with(line->buf, "[PATCH]") && isspace(line->buf[7])) {
> diff --git a/builtin/mailsplit.c b/builtin/mailsplit.c
> index 763cda0..11ba281 100644
> --- a/builtin/mailsplit.c
> +++ b/builtin/mailsplit.c
> @@ -12,37 +12,6 @@
> static const char git_mailsplit_usage[] =
> "git mailsplit [-d<prec>] [-f<n>] [-b] [--keep-cr] -o<directory>
> [(<mbox>|<Maildir>)...]";
>
> -static int is_from_line(const char *line, int len)
> -{
> - const char *colon;
> -
> - if (len < 20 || memcmp("From ", line, 5))
> - return 0;
> -
> - colon = line + len - 2;
> - line += 5;
> - for (;;) {
> - if (colon < line)
> - return 0;
> - if (*--colon == ':')
> - break;
> - }
> -
> - if (!isdigit(colon[-4]) ||
> - !isdigit(colon[-2]) ||
> - !isdigit(colon[-1]) ||
> - !isdigit(colon[ 1]) ||
> - !isdigit(colon[ 2]))
> - return 0;
> -
> - /* year */
> - if (strtol(colon+3, NULL, 10) <= 90)
> - return 0;
> -
> - /* Ok, close enough */
> - return 1;
> -}
> -
> static struct strbuf buf = STRBUF_INIT;
> static int keep_cr;
>
> diff --git a/cache.h b/cache.h
> index dfa1a56..9e71ca5 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1568,4 +1568,10 @@ void stat_validity_update(struct stat_validity *sv,
> int fd);
>
> int versioncmp(const char *s1, const char *s2);
>
> +/*
> + * Returns true if the line appears to be an mbox "From" line starting a new
> + * message.
> + */
> +int is_from_line(const char *line, int len);
> +
> #endif /* CACHE_H */
> diff --git a/mbox.c b/mbox.c
> new file mode 100644
> index 0000000..75f3150
> --- /dev/null
> +++ b/mbox.c
> @@ -0,0 +1,32 @@
> +#include "cache.h"
> +
> +int is_from_line(const char *line, int len)
> +{
> + const char *colon;
> +
> + if (len < 20 || memcmp("From ", line, 5))
> + return 0;
> +
> + colon = line + len - 2;
> + line += 5;
> + for (;;) {
> + if (colon < line)
> + return 0;
> + if (*--colon == ':')
> + break;
> + }
> +
> + if (!isdigit(colon[-4]) ||
> + !isdigit(colon[-2]) ||
> + !isdigit(colon[-1]) ||
> + !isdigit(colon[ 1]) ||
> + !isdigit(colon[ 2]))
> + return 0;
> +
> + /* year */
> + if (strtol(colon+3, NULL, 10) <= 90)
> + return 0;
> +
> + /* Ok, close enough */
> + return 1;
> +}
> diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
> index 3e64a7a..9e1ad1c 100755
> --- a/t/t5100-mailinfo.sh
> +++ b/t/t5100-mailinfo.sh
> @@ -89,4 +89,22 @@ test_expect_success 'mailinfo on from header without name
> works' '
>
> '
>
> +test_expect_success 'mailinfo finds headers after embedded From line' '
> + mkdir embed-from &&
> + git mailsplit -oembed-from "$TEST_DIRECTORY"/t5100/embed-from.in &&
> + test_cmp "$TEST_DIRECTORY"/t5100/embed-from.in embed-from/0001 &&
> + git mailinfo embed-from/msg embed-from/patch \
> + <embed-from/0001 >embed-from/out &&
> + test_cmp "$TEST_DIRECTORY"/t5100/embed-from.expect embed-from/out
> +'
> +
> +test_expect_success 'mailinfo on message with quoted >From' '
> + mkdir quoted-from &&
> + git mailsplit -oquoted-from "$TEST_DIRECTORY"/t5100/quoted-from.in &&
> + test_cmp "$TEST_DIRECTORY"/t5100/quoted-from.in quoted-from/0001 &&
> + git mailinfo quoted-from/msg quoted-from/patch \
> + <quoted-from/0001 >quoted-from/out &&
> + test_cmp "$TEST_DIRECTORY"/t5100/quoted-from.expect quoted-from/msg
> +'
> +
> test_done
> diff --git a/t/t5100/embed-from.expect b/t/t5100/embed-from.expect
> new file mode 100644
> index 0000000..06a3a38
> --- /dev/null
> +++ b/t/t5100/embed-from.expect
> @@ -0,0 +1,5 @@
> +Author: Commit Author
> +Email: [email protected]
> +Subject: patch subject
> +Date: Sat, 13 Sep 2014 21:13:23 -0400
> +
> diff --git a/t/t5100/embed-from.in b/t/t5100/embed-from.in
> new file mode 100644
> index 0000000..5f3f84e
> --- /dev/null
> +++ b/t/t5100/embed-from.in
> @@ -0,0 +1,13 @@
> +From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
> +From: Email Author <[email protected]>
> +Date: Sun, 25 May 2008 00:38:18 -0700
> +Subject: [PATCH] email subject
> +
> +>From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
> +From: Commit Author <[email protected]>
> +Date: Sat, 13 Sep 2014 21:13:23 -0400
> +Subject: patch subject
> +
> +patch body
> +---
> +patch
> diff --git a/t/t5100/quoted-from.expect b/t/t5100/quoted-from.expect
> new file mode 100644
> index 0000000..8c9d48c
> --- /dev/null
> +++ b/t/t5100/quoted-from.expect
> @@ -0,0 +1,3 @@
> +>From the depths of history, we are stuck with the
> +flaky mbox format.
> +
> diff --git a/t/t5100/quoted-from.in b/t/t5100/quoted-from.in
> new file mode 100644
> index 0000000..847e1c4
> --- /dev/null
> +++ b/t/t5100/quoted-from.in
> @@ -0,0 +1,10 @@
> +From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
> +From: Author Name <[email protected]>
> +Date: Sun, 25 May 2008 00:38:18 -0700
> +Subject: [PATCH] testing quoted >From
> +
> +>From the depths of history, we are stuck with the
> +flaky mbox format.
> +
> +---
> +patch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html