Jonathan Tan <> writes:

> Instead of repeatedly calling "check_header" (as in this patch), one
> alternate method to accomplish this would be to have a buffer of
> potential header text in struct mailinfo to be flushed whenever a header
> is known to end (for example, if we detect the start of a new header),
> but this makes the logic more complicated - for example, the flushing
> would not only invoke check_header but would also need to reconstruct
> the original lines, possibly decode them into UTF-8, and store them in
> log_message, and any failures would be noticed a few "lines" away from
> the original failure point. Also, care would need to be taken to flush
> the buffer at all appropriate places.

I am not sure how much the UTF-8 decoding argument above matters.

The current way handle_commit_msg() is structured (before any of
your patches) is for it to take one raw line at a time and:

    - If we haven't seen a non-header line (i.e. at the beginning,
      or we were reading in-body headers), return without doing

    - If we are told to honor in-body headers and if we haven't seen
      a non-header line, see if the line itself looks like a header
      and if so, handle it as an in-body header and return.  If that
      line is not an in-body header, continue processing.

    - If the processing reaches at this point, we are done with the
      headers (i.e. mi->header_stage is set to 0).

    - Make sure the line is in utf8.

    - If it is a scissors line and we are told to honor scissors
      lines, ignore what we have read so far and go back to "we
      haven't seen a non-header line" state and return.

    - If it is a patch break, return and signal the caller we are
      done with the log message.

    - Otherwise accumulate the line as part of the log message.

The bug we want to address is in the second step.  We only look at
the first line of folded in-body header line, because we are fed one
line at a time.

If we keep the location of UTF8 conversion, and buffered the in-body
header in "struct mailinfo *mi" (like you seem to do in this patch),
what we will queue there will be _before_ conversion.  We'd call
check_header() on it once we know one logical line of a header is
accumulated, and check_header() would do the right conversion via
decode_header() etc., so I do not see why you need to worry about
the encoding issues at all.

I wonder if the simplest would be to introduce another state in the
state machine that is "we know we are processing in-body header, and
we have read early part of an in-body header line that may not be

In other words, wouldn't something like the illustration at the end
of this message sufficient?  If the body consists solely of in-body
header without any patch or patchbreak, we may reach EOF with
something in mi->in_line_header buffer and nothing in
mi->log_message and without this function getting any chance to
return 1, so a careful caller may want to flush in_line_header, but
the overall result of the mailinfo subsystem in such a case would be
an error ("you didn't have any patch or a message?"), so it may not
matter too much.

What am I missing?

        if (mi->in_line_header->len) {
                /* we have read the beginning of one in-line header */
                if (line->len && isspace(*line->buf))
                        append to mi->in_line_header strbuf;
                        return 0;
                /* otherwise we know mi->in_line_header is now complete */
                check_header(mi, mi->in_line_header, ...);

        if (mi->header_stage && (it is a blank line))
                return 0;

        if (mi->use_inbody_headers && mi->header_stage &&
            (the line looks like beginning of 2822 header)) {
                strbuf_addbuf(&mi->in_line_header, line);
                return 0;
        /* otherwise we are no longer looking at headers */
        mi->header_stage = 0;

        /* normalize the log message to UTF-8. */
        if (convert_to_utf8(mi, line, mi->charset.buf))
                return 0; /* mi->input_error already set */

        if (mi->use_scissors && is_scissors_line(line)) {
                int i;

                strbuf_setlen(&mi->log_message, 0);
                mi->header_stage = 1;

                 * We may have already read "secondary headers"; purge
                 * them to give ourselves a clean restart.
                for (i = 0; header[i]; i++) {
                        if (mi->s_hdr_data[i])
                        mi->s_hdr_data[i] = NULL;
                return 0;

        if (patchbreak(line)) {
                if (mi->message_id)
                                    "Message-Id: %s\n", mi->message_id);
                return 1;

        strbuf_addbuf(&mi->log_message, line);
        return 0;

Reply via email to