I think what this needs is a bit more principled approach to the task of
reformatting. I propose the following model, based roughly on what it looks
like you are implementing in this patch:
The task is broken up into two main parts governed by orthogonal policies
(and implemented in their own, appropriately named file(s)).
# Identify "logical lines" (what you call "continuations"). The way to think
about this is that if theoretically there were no line length limit, then how
would the source be laid out. The tokens which end up being on each "logical
line" are handled separately. Your comment about "the key property" in the
comment above Continuation is truly pure gold: basically, what this says is
that identifying logical lines is orthogonal to...
# Wrapping logical lines to the desired physical source width. This has an
independent set of policy decisions associated with it and deals with a
completely different set of abstractions. Really the only input from from the
last step is how deep this logical line is indented (and hence how much to
subtract from the available physical line length).
Basically, the task and corresponding configuration policy associated with
each part can be cleanly separated. The configurability at each step is:
# How to identify logical lines.
* There can be a predetermined set of locations at which we call "ship out
the next logical line of accumulated tokens", and during the "parsing", at key
places we do like `if (LineBreakConf.shouldBreakBeforeIfBrace())
shipOutNextLogicalLine();`.
This is an easily extensible mechanism and provides a clear *model* for
extensibility.
* if we really want to get crazy, we can offer a callback here and let the
user decide on a case-by case basis.
# How to wrap logical lines to a physical source width and place tokens.
* This is a bit more complicated. Once again, we can provide a set of
predetermined places to break (e.g. "after `&&`") a set of predetermined
policies for how to align wrapped lines, and how to place tokens ("put the * on
the right"), along with relative weights for each one. These can all be boiled
down into a set of weights parametrizing where to break, how to align things,
and where to put spaces (similar to TeX's notion of "badness"). Actually, this
is basically the same task as what TeX does when breaking lines to put together
paragraphs, so rather than rolling your own line breaking algorithm, you might
consider just using the TeX algorithm (see Knuth & Plass, "Breaking Paragraphs
into Lines", Software Practice and Experience, 1981).
There is probably a use case for a "peephole" pass at the end, to deal with
things like "no braces if the `if` only has a single statement for its body".
Having a model like this is a HUGE boon for understandability, usability, and
maintenance of the library.
================
Comment at: lib/Format/ContinuationParser.h:9-12
@@ +8,6 @@
+//===----------------------------------------------------------------------===//
+//
+// This is EXPERIMENTAL code under heavy development. It is not in a state
yet,
+// where it can be used to format real code.
+//
+//===----------------------------------------------------------------------===//
----------------
http://llvm.org/docs/CodingStandards.html#file-headers
Describe the file. Describe the "theory of operations".
================
Comment at: lib/Format/ContinuationParser.h:49-55
@@ +48,9 @@
+
+/// \brief A continuation is a sequence of \c Token, that we would like to put
+/// on a single line if there was no column limit.
+///
+/// This is used as a main interface between the \c ContinuationParser and the
+/// \c ContinuationFormatter. The key property is that changing the formatting
+/// within a continuation does not affect any other continuation.
+struct Continuation {
+ Continuation() : Level(0) {}
----------------
Why is this called "continuation"? The name doesn't seem to have anything to do
with the description; the only meaning of "continuation" as a noun that I can
think of is "line continuation", which is the separating characters that join
the logical line across physical source lines (e.g.
http://msdn.microsoft.com/en-us/library/aa711641(v=vs.71).aspx).
How about "LogicalLine"?
Your sentence "the key property" is pure gold, btw.
================
Comment at: lib/Format/ContinuationParser.h:58-59
@@ +57,4 @@
+
+ /// \brief The \c Token comprising this continuation.
+ std::vector<FormatToken> Tokens;
+
----------------
Prefer SmallVector
================
Comment at: lib/Format/ContinuationParser.cpp:9-12
@@ +8,6 @@
+//===----------------------------------------------------------------------===//
+//
+// This is EXPERIMENTAL code under heavy development. It is not in a state
yet,
+// where it can be used to format real code.
+//
+//===----------------------------------------------------------------------===//
----------------
Proper file header needed here too.
================
Comment at: lib/Format/Format.cpp:45
@@ +44,3 @@
+ State.Indent.push_back(Cont.Level * 2 + 4);
+ for (unsigned i = 1, e = Cont.Tokens.size(); i != e; ++i) {
+ bool InsertNewLine = Cont.Tokens[i].NewlinesBefore > 0;
----------------
http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly
I and N are the canonical names for integer loop variables (I and E moreso for
iterators).
also, comment why you are iterating starting at 1.
================
Comment at: lib/Format/Format.cpp:58
@@ +57,3 @@
+private:
+ /// \brief The current state when indenting a continuation.
+ ///
----------------
Ugh. Yeah, "logical line" would be a lot better.
================
Comment at: lib/Format/Format.cpp:86
@@ +85,3 @@
+
+ // Append the token at 'Index' to the IndentState 'State'.
+ void addToken(unsigned Index, bool Newline, bool DryRun, IndentState &State)
{
----------------
Doxygenify
http://llvm-reviews.chandlerc.com/D80
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits