Hi Ben,
I sent you a private email before indicating that changes to the `rfcfold`
script addressing your concerns were in progress. I'm happy to announce that
an update has just been posted containing these changes (much thanks to my
co-author, Erik, CC-ed).
Please see below as well.
Thanks,
Kent // co-author
>>> ----------------------------------------------------------------------
>>> DISCUSS:
>>> ----------------------------------------------------------------------
>>>
>>> I think the procedures described herein are incomplete without a footer
>>> to terminate the un-folding process. Otherwise, it seems that the
>>> described algorithms would leave the two-line header for the second and
>>> subsequent instances of folded text in a single document. (If we tried
>>> to just blindly remove all instances of the header without seeking
>>> boundaries, then we would misreconstruct content when different folding
>>> algorithms are used in the same document with the single-backslash
>>> algorithm occurring first.)
>>
>> Are you referring to when an RFC contains multiple inclusions and one is
>> trying to unfold them all at once? That's not the intention here, as
>
> Yes, that was what I was thinking; sorry for missing or misinterpreting the
> notes in Sections 7.2/8.2.
This issue is resolved.
>> noted in paragraph 3 in both sections 7.2 and 8.2. FWIW, this sounds
>> like the framing problem that the WG discussed with the conclusion that
>> extracting from plain-text is dead, now that XML is the required
>> submission format, and XML provides a superior framing mechanism than any
>> footer we could add.
>>
>> BTW, yes, each text inclusion in a single RFC may independently be folded
>> using either the '\' or '\\' strategy, with the recommendation that '\'
>> always be tried first and '\\' only used when '\' fails.
>>
>> If referring to a single text content instance, could you provide an
>> example illustrating the concern?
>>
>>
>>
>>
>>> I don't think it's proper to refer to a script that requires bash
>>> specifically as a "POSIX shell script". I did not attmept to check
>>> whether any bash-specific features are used or this requirements stems
>>> solely from the shebang line, though.
>>
>> I just changed "POSIX" to "Bash" in the title for Appendix A.
>>
>> Not that it matters, but "--posix" is passed into `bash` on the first
>> line of the script ;)
>>
>>
>>
>>> I think the shell script does need to use double-quotes around some
>>> variable expansions, especially "$infile" and "$outfile", to work
>>> properly for filenames containing spaces. We do quote "$infile" when
>>> we're checking that it exists, just not (most of the time) when we
>>> actually use it!
>>
>> Updated.
>>
>>
>>
>>> In addition to the above, I also share Alissa's (and Mirja's) concerns,
>>> but feel that Discuss is more appropriate than Abstain, so we can
>>> discuss what the best way to get this content published is. For it's
>>> fine content, and we should see it published; it's just not immediately
>>> clear to me what the right way to do so is.
>>
>> Agreed. For now, I've changed it to Informational, but I think there
>> remains a discussion around if the draft should be re-rerun through the
>> IAB stream. My responses today to Alissa's Abstain and Suresh Discuss
>> dig into this. Is it okay to use those threads for this item?
>
> Please do; this point was mostly intended to make sure that we didn't
> inadvertently approve the document while those discussions were still going
> on.
This issue is currently with the IESG.
>>> ----------------------------------------------------------------------
>>> COMMENT:
>>> ----------------------------------------------------------------------
>>>
>>> Section 4.1
>>>
>>> Automated folding of long lines is needed in order to support draft
>>> compilations that entail a) validation of source input files (e.g.,
>>> XML, JSON, ABNF, ASN.1) and/or b) dynamic generation of output, using
>>> a tool that doesn't observe line lengths, that is stitched into the
>>> final document to be submitted.
>>>
>>> I don't think the intended meaning of "source input files" will be
>>> clear to all readers just from this text. Some discussion of how RFCs
>>> can consider source code, data structures, generated output, etc., that
>>> have standalone representations and natural formats, and the need to
>>> display their contents in the RFC format that has different
>>> requirements might be helpful context for this paragraph and the next.
>>
>> Is the updated text more understandable?
>
> Yes, thanks
Great, this issue is closed.
>>> Section 7.1.2
>>>
>>> For some reason my mental model of "RFC style" does not use the word
>>> "really" in this way, and prefers alternatives like "very" or
>>> "exceptionally". (Also in Section 8.1.2.)
>>
>> s/Really/Exceptionally/ in both cases.
>>
>>
>>> Section 7.2.1
>>>
>>> 1. Determine where the fold will occur. This location MUST be
>>> before or at the desired maximum column, and MUST NOT be chosen such
>>> that the character immediately after the fold is a space (' ')
>>> character. For forced foldings, the location is between the
>>>
>>> This is a rather awkward natural line break. I suggest an RFC Editor
>>> note to make sure that the punctuation around the space character all
>>> appears on the same line.
>>
>> RFC Editor note added, near the top of the draft.
>>
>>
>>
>>> 3. On the following line, insert any number of space (' ')
>>> characters.
>>>
>>> I'm not sure I'd characterize the procedure as "complete" when it
>>> leaves the value of the output subject to implementation choice such as
>>> this. (Note that the next paragraph talks about the resulting
>>> "arbitrary number of space" characters, and would presumably also need
>>> to be adjusted if this text was adjusted.) We also don't seem to bound
>>> this number of spaces to be fewer than the target line length, which
>>> only matters in some weirdly pedantic sense.
>>
>> Added "subject to the resulting line not exceeding the desired maximum"
>> to both locations in the draft.
>>
>>
>>
>>> Section 7.2.2
>>>
>>> Scan the beginning of the text content for the header described in
>>> Section 7.1.1. If the header is not present, starting on the first
>>> line of the text content, exit (this text contents does not need to
>>> be unfolded).
>>>
>>> I'm not sure I understand what "starting on the first line of the text
>>> content" is intended to mean. (Also in 8.2.2.)
>>
>> I think you are saying that it seems overly prescriptive, given that the
>> previous sentence says "beginning" and "header", it defies logic that the
>> header might not start on the first line and, by this text calling it
>> out, it suggests something special is going on. Is this what you mean?
>> To be clear, the only intention here is to catch the case whereby there
>> might be some blank lines preceding the header. Do you think the
>> "starting on the first line of the text content" fragment should be
>> removed?
>
> I think I was too confused by the text to be complaining that it was overly
> prescriptive :(
> I guess my complaint is that it seems ambiguous whether this is "the
> procedure says: start on the first line of text content, and check for the
> header" or "If the header is not present [anywhere in the content], start
> on the first line of content, and exit". That is, I think the order in
> which the clauses appear confuses me, with perhaps some exacerbation by
> verb tense. I support being able to cope with some blank lines preceding
> the header!
I have removed the "starting on the first line of the text content" fragment,
from both 7.2.2 and 8.2.2, since it seemed unnecessary and caused confusion.
>>> Section 8.2.1
>>>
>>> If this text content needs to and can be folded, insert the header
>>> described in Section 8.1.1, ensuring that any additional printable
>>> characters surrounding the header do not result in a line exceeding
>>> the desired maximum.
>>>
>>> We discussed above some cases when text could not be folded using the
>>> algorithm from Section 7.2.1; in what case could text not be folded
>>> with this algorithm? Just the case when the implementation doesn't
>>> support forced folding?
>>
>> Yes, that's the only case known. But what does this have to do with
>> Section 8.2.1? Are you keying off of the "needs to" part? Is it okay?
>
> I was just trying to check that we have given the reader enough information
> to ascertain the "can be folded" result.
I wish to amend my previous statement, other reasons that might lead to
unfoldability include:
1) presence of a TAB character. This issue is already discussed in this draft.
2) presence of ASCII-based control characters. This issue was not discussed
previously (nor in RFC 7991), but control characters in general (i.e., beyond
TAB) are an issue. But the issue may be just a limitation in the command line
tools like `sed` that are byte-orientated more so than character-oriented.
Thusly, in the latest update, the `rfcfold` script now issues a *warning* if it
detects any ASCII control characters.
3) presence of non-ASCII (e.g., UTF-8) characters. This issue was not
discussed previously (nor in RFC 7991), but multibyte characters and
multi-width-characters are not supported by `sed`. It is unclear from RFC
7991 and RFC 7994 if such characters may appear in <sourcecode> and <artwork>
inclusions, but presumably they MAY (e.g., the XML file format is known to
support UTF-8 encodings). To be safe, the `rfcfold` script now issues a
*warning* if it detects any non-ASCII characters.
>>> Section 10
>>>
>>> We should warn against implementations scanning past the end of a
>>> buffer (containing the entire contents of a file) when checking what's
>>> in the beginning of the next line -- if a file ends with a backslash
>>> and "end of line" but no further content, we could perform an out of
>>> bounds access if the code assumes it is safe to check for the next
>>> line's initial content.
>>
>> Both Sections 7.2.2 and 8.2.2 describe conditions to determine when
>> unfolding occurs. AFIACT, in both cases, the unfolding algorithm stays
>> within the bounds of those conditions.
>
> These procedures are fine if you're operating in a context where you
> interact with the text corpus via "get next line" operations. But I don't
> think we have limited ourselves to such contexts; consider the case where I
> (foolishly) write text-processing code in C, and read(2) the text in
> question into a memory buffer. I'm on my own for linebreak detection, and
> if I start peeking past escape characters, it's not so hard to imagine that
> I could fail to check for "end of buffer" and trigger undefined behavior.
>
>> For instance, given the input sequence [ '\' '\n' EOF] , the 7.2.2
>> algorithm would replace it with [ EOF ] and the 8.2.2 algorithm wouldn't
>> even attempt to unfold it since the condition of the next line containing
>> a second '\' character isn't met.
>>
>> Is this Security Consideration needed?
>
> Well, it's a nonblocking comment. So if the above description seems
> totally implausible to you, I can accept it not being included in the
> document.
I'll choose this route, thanks.
>>> Section 12.2
>>>
>>> I think that RFC 7991 could be normative, since we say "per RFC 7991"
>>> to describe some requirements on behavior. Likewise for RFC 7994,
>>> whose character encoding requirements we incorporate by reference.
>>
>> Given that this format may be used in contexts outside the IETF, it seems
>> that understanding RFC 7991 is optional. Agreed?
>
> For most of the occurrences of 7991 references, I agree with you. The only
> one that makes me think otherwise is in Section 7.1.2:
>
> The character encoding is the same as described in Section 2 of
> [RFC7994], except that, per [RFC7991], tab characters are prohibited.
>
> which is a statement of behavior that defers to an external specification.
Okay, RFC 7991 is now a normative reference.
>>> Appendix A
>>>
>>> I could perhaps argue that we should include a reference to POSIX for
>>> "POSIX shell script" but find it somewhat hard to believe that this
>>> would be a problem in practice. It's also moot since we require bash
>>> specifically, so we'd need to reference bash instead of POSIX.
>>
>> Per above, "POSIX" is now "Bash" in the title. I added an Informative
>> reference for Bash.
>
> Thanks!
>
>>
>>> copy/paste the script for local use. As should be evident by the
>>> lack of the mandatory header described in Section 7.1.1, these
>>> backslashes do not designate a folded line, such as described in
>>> Section 7.
>>>
>>> It perhaps should be, but I think currently is not -- we only talk
>>> about using the two-line header to detect instances of folding, without
>>> mention of a requirement to be contained within <CODE BEGINS>/<CODE
>>> ENDS> or similar.
>>
>> Correct. The 2-line header is missing. That <CODE BEGINS>/<CODE ENDS>
>> appears is secondary. Is there anything to be done here?
>
> In light of the previous discussion about extracting artwork individually
> from the document, probably not.
Okay, this issue is closed.
> Though it seems the -10 has added a line-wrapping header to the script,
> which seems to be inadvertent, if I understand correctly.
That was a mistake. The authors added a build-time test-case ensuring that the
`rfcfold` script doesn't require folding when appearing in Appendix A.
>>> It seems that my perception of "common shell style" diverges from that
>>> presented in this document, which is not necessarily problematic.
>>> (Things like what diagnostics go to stdout vs. stderr, use or ">
>>> /dev/null" vs ">> /dev/null", etc.)
>>
>> I fixed one "> /dev/null" case.
>
> Heh, I was trying to say that I prefer to always write "> /dev/null", while
> acknowledging that my preference is irrelevant for this document. I'm glad
> it helped to fix a consistency nit, though!
The script now uses "> /dev/null" throughout.
>> As for style, we could review line by line but, for the cases where
>> output is directed to /dev/null/, it's unclear where the output is
>> needed, only the exit code status ever seems to matter.
>>
>>
>>> printf "Usage: rfcfold [-s <strategy>] [-c <col>] [-r] -i <infile>"
>>> printf " -o <outfile>\n"
>>>
>>> This summary usage line doesn't mention -d, -q, or -h. (Maybe it
>>> doesn't have to, of course.)
>>
>> Added.
>>
>>
>>> # ensure input file doesn't contain a TAB grep $'\t' $infile >>
>>> /dev/null 2>&1
>>>
>>> (`grep -q` is a thing, here and elsewhere.)
>>
>> Added.
>>
>>
>>> # unfold wip file "$SED" '{H;$!d};x;s/^\n//;s/\\\n *//g'
>>> $temp_dir/wip > $outfile
>>>
>>> [I don't remember why the s/^\n// is needed; similarly for the
>>> unfold_it_2() case.]
>>
>> Erik responded to this point already.
>>
>>
>>> if [[ $strategy -eq 2 ]]; then min_supported=`expr ${#hdr_txt_2} +
>>> 8` else min_supported=`expr ${#hdr_txt_1} + 8` fi
>>>
>>> On the face of it this seems like it will produce "folded" output that
>>> exceeds the line length, when we give min_supported of 54, use
>>> autodetection of strategy, and have input that is incompatible with
>>> fold_it_1().
>>
>> Fixed off-by-one error.
>>
>>
>>
>>> process_input $@
>>>
>>> Need double-quotes around "$@" to properly handle arguments with
>>> embedded spaces.
>>
>> Added.
>
> Thanks!
>
> I'll try to find time to look at the new script with an eye for quoting,
> and update my position in the datatracker; please start complaining if I
> haven't done so and the other threads about where/how to publish have come
> to a conclusion.
>
> -Ben
Please let us know if you see any other issues needing to be addressed!
Thanks,
Kent // co-author
_______________________________________________
netmod mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/netmod