Here's the draft PEP I wrote up: Abstract
Triple-quoted string (TQS henceforth) literals in Python preserve the formatting of the literal string including newlines and whitespace. When a programmer desires no leading whitespace for the lines in a TQS, he must align all lines but the first in the first column, which differs from the syntactic indentation when a TQS occurs within an indented block. This PEP addresses this issue. Motivation TQS's are generally used in two distinct manners: as multiline text used by the program (typically command-line usage information displayed to the user) and as docstrings. Here's a hypothetical but fairly typical example of a TQS as a multiline string: if not interactive_mode: if not parse_command_line(): print """usage: UTIL [OPTION] [FILE]... try `util -h' for more information.""" sys.exit(1) Here the second line of the TQS begins in the first column, which at a glance appears to occur after the close of both "if" blocks. This results in a discrepancy between how the code is parsed and how the user initially sees it, forcing the user to jump the mental hurdle in realising that the call to sys.exit() is actually within the second "if" block. Docstrings on the other hand are usually indented to be more readable, which causes them to have extraneous leading whitespace on most lines. To counteract the problem, PEP 257 [1] specifies a standard algorithm for trimming this whitespace. In the end, the programmer is left with a dilemma: either to align the lines of his TQS to the first column, and sacrifice readability; or to indent it to be readable, but have to deal with unwanted whitespace. This PEP proposes that TQS's should have a certain amount of leading whitespace trimmed by the parser, thus avoiding the drawbacks of the current behaviour. Specification Leading whitespace in TQS's will be dealt with in a similar manner to that proposed in PEP 257: "... strip a uniform amount of indentation from the second and further lines of the [string], equal to the minimum indentation of all non-blank lines after the first line. Any indentation in the first line of the [string] (i.e., up to the first newline) is insignificant and removed. Relative indentation of later lines in the [string] is retained." Note that a line within the TQS that is entirely blank or consists only whitespace will not count toward the minimum indent, and will be retained as a blank line (possibly with some trailing whitespace). There are several significant differences between this proposal and PEP 257's docstring parsing algorithm: * This proposal considers all lines to end at the next newline in the source code (whether escaped or not); PEP 257's algorithm only considers lines to end at the next (necessarily unescaped) newline in the parsed string. * Only literal whitespace is counted; an escape such as \x20 will not be counted as indentation. * Tabs are not converted to spaces. * Blank lines at the beginning and end of the TQS will *not* be stripped. * Leading whitespace on the first line is preserved, as is trailing whitespace on all lines. Rationale I considered several different ways of determining the amount of whitespace to be stripped, including: 1. Determined by the column (after allowing for expanded tabs) of the triple-quote: myverylongvariablename = """\ This line is indented, But this line is not. Note the trailing newline: """ + Easily allows all lines to be indented. - Easily leads to problems due to re-alignment of all but first line when mixed tabs and spaces are used. - Forces programmers to use a particular level of indentation for continuing TQS's. - Unclear whether the lines should align with the triple- quote or immediately after it. - Not backward compatible with most non-docstrings. 2. Determined by the indent level of the second line of the string: myverylongvariablename = """\ This line is not indented (and has no leading newline), But this one is. Note the trailing newline: """ + Allows for flexible alignment of lines. + Mixed tabs and spaces should be fine (as long as they're consistent). - Cannot support an indent on the second line of the string (very bad!). - Not backward compatible with most non-docstrings. 3. Determined by the minimum indent level of all lines after the first: myverylongvariablename = """\ This line is indented, But this line is not. Note the trailing newline: """ + Allows for flexible alignment of lines. + Mixed tabs and spaces should be fine (as long as they're consistent). + Backward compatible with all docstrings and a majority of non-docstrings - Support for indentation on all lines not immediately obvious Overall, solution 3 provided the best balance of features, and (importantly) had the best backward compatibility. I thus consider it the most suitable. Examples The examples here are set out in pairs: the first of each pair shows how the TQS must be currently written to avoid indentation issues; the second shows how it can be written using this proposal (although some variation is possible). All examples are taken or adapted from the Python standard library or another real source. 1. Command-line usage information: def usage(outfile): outfile.write("""Usage: %s [OPTIONS] <file> [ARGS] Meta-options: --help Display this help then exit. --version Output version information then exit. """ % sys.argv[0]) #------------------------# def usage(outfile): outfile.write("""Usage: %s [OPTIONS] <file> [ARGS] Meta-options: --help Display this help then exit. --version Output version information then exit. """ % sys.argv[0]) 2. Embedded Python code: self.runcommand("""if 1: import sys as _sys _sys.path = %r del _sys \n""" % (sys.path,)) #------------------------# self.runcommand("""\ if 1: import sys as _sys _sys.path = %r del _sys \n""" % (sys.path,)) 3. Unit testing class WrapTestCase(BaseTestCase): def test_subsequent_indent(self): # Test subsequent_indent parameter expect = '''\ * This paragraph will be filled, first without any indentation, and then with some (including a hanging indent).''' result = fill(self.text, 40, initial_indent=" * ", subsequent_indent=" ") self.check(result, expect) #------------------------# class WrapTestCase(BaseTestCase): def test_subsequent_indent(self): # Test subsequent_indent parameter expect = '''\ * This paragraph will be filled, first without any indentation, and then with some (including a hanging indent).\ ''' result = fill(self.text, 40, initial_indent=" * ", subsequent_indent=" ") self.check(result, expect) Example 3 illustrates how indentation of all lines (by 2 spaces) is achieved with this proposal: the position of the closing triple quote is used to determine the minimum indentation for the whole string. To avoid a trailing newline in the string, the final newline is escaped. Example 2 avoids the need for this construction by placing the first line (which is not indented) on the line after the triple-quote, and escaping the leading newline. Backwards Compatibility Uses of TQS's fall into two broad categories: those where indentation is significant, and those where it is not. Those in the latter (larger) category, which includes all docstrings, will remain effectively unchanged under this proposal. Docstrings in particular are usually trimmed according to the rules in PEP 257 before their value is used; the trimmed strings will be the same under this proposal as they are now. Of the former category, the majority are those which have at least one line beginning in the first column of the source code; these will be entirely unaffected if left alone, but may be reformatted to increase readability (see example 1 above). However a small number of strings in this first category depend on all lines (or all but the first) being indented. Under this proposal, these will need to be edited to ensure that the intended amount of whitespace is preserved. Examples 2 and 3 above show two different ways to reformat the strings for these cases. Note that in both examples, the overall indentation of the code is cleaner, producing more readable code. Some evidence may be desired to support the claims made above regarding the distribution of the different uses of TQS's. I have begun some analysis to produce some statistics for these; while still incomplete, I have some initial results for the Python 2.4.1 standard library (these figures should not be off by more than a small margin): In the standard library (some 396,598 lines of Python code), there are 7,318 occurrences of TQS's, an average rate of one per 54 lines. Of these, 6,638 (90.7%) are docstrings; the remaining 680 (9.3%) are not. A further examination shows that only 64 (0.9%) of these have leading indentation on all lines (the only case where the proposed solution is not backward compatible). These must be manually checked to determine whether they will be affected; such a check reveals only 7-15 TQS's (0.1%-0.2%) that actually need to be edited. Although small, the impact of this proposal on compatibility is still more than negligible; if accepted in principle, it might be better suited to be initially implemented as a __future__ feature, or perhaps relegated to Python 3000. Implementation An implementation for this proposal has been made; however I have not yet made a patch file with the changes, nor do the changes yet extend to the documentation or other affected areas. References [1] PEP 257, Docstring Conventions, David Goodger, Guido van Rossum http://www.python.org/peps/pep-0257.html Copyright This document has been placed in the public domain. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com