I think what this needs is a bit more principled approach to the task of 
reformatting. I propose the following model, based roughly on what it looks 
like you are implementing in this patch:

  The task is broken up into two main parts governed by orthogonal policies 
(and implemented in their own, appropriately named file(s)).

  #  Identify "logical lines" (what you call "continuations"). The way to think 
about this is that if theoretically there were no line length limit, then how 
would the source be laid out. The tokens which end up being on each "logical 
line" are handled separately. Your comment about "the key property" in the 
comment above Continuation is truly pure gold: basically, what this says is 
that identifying logical lines is orthogonal to...
  # Wrapping logical lines to the desired physical source width. This has an 
independent set of policy decisions associated with it and deals with a 
completely different set of abstractions. Really the only input from from the 
last step is how deep this logical line is indented (and hence how much to 
subtract from the available physical line length).

  Basically, the task and corresponding configuration policy associated with 
each part can be cleanly separated. The configurability at each step is:

  # How to identify logical lines.
    * There can be a predetermined set of locations at which we call "ship out 
the next logical line of accumulated tokens", and during the "parsing", at key 
places we do like `if (LineBreakConf.shouldBreakBeforeIfBrace()) 
shipOutNextLogicalLine();`.
    This is an easily extensible mechanism and provides a clear *model* for 
extensibility.
    * if we really want to get crazy, we can offer a callback here and let the 
user decide on a case-by case basis.

  # How to wrap logical lines to a physical source width and place tokens.
    * This is a bit more complicated. Once again, we can provide a set of 
predetermined places to break (e.g. "after `&&`") a set of predetermined 
policies for how to align wrapped lines, and how to place tokens ("put the * on 
the right"), along with relative weights for each one. These can all be boiled 
down into a set of weights parametrizing where to break, how to align things, 
and where to put spaces (similar to TeX's notion of "badness"). Actually, this 
is basically the same task as what TeX does when breaking lines to put together 
paragraphs, so rather than rolling your own line breaking algorithm, you might 
consider just using the TeX algorithm (see Knuth & Plass, "Breaking Paragraphs 
into Lines", Software Practice and Experience, 1981).

  There is probably a use case for a "peephole" pass at the end, to deal with 
things like "no braces if the `if` only has a single statement for its body".

  Having a model like this is a HUGE boon for understandability, usability, and 
maintenance of the library.


================
Comment at: lib/Format/ContinuationParser.h:9-12
@@ +8,6 @@
+//===----------------------------------------------------------------------===//
+//
+//  This is EXPERIMENTAL code under heavy development. It is not in a state 
yet,
+//  where it can be used to format real code.
+//
+//===----------------------------------------------------------------------===//
----------------
http://llvm.org/docs/CodingStandards.html#file-headers

Describe the file. Describe the "theory of operations".

================
Comment at: lib/Format/ContinuationParser.h:49-55
@@ +48,9 @@
+
+/// \brief A continuation is a sequence of \c Token, that we would like to put
+/// on a single line if there was no column limit.
+///
+/// This is used as a main interface between the \c ContinuationParser and the
+/// \c ContinuationFormatter. The key property is that changing the formatting
+/// within a continuation does not affect any other continuation.
+struct Continuation {
+  Continuation() : Level(0) {}
----------------
Why is this called "continuation"? The name doesn't seem to have anything to do 
with the description; the only meaning of "continuation" as a noun that I can 
think of is "line continuation", which is the separating characters that join 
the logical line across physical source lines (e.g. 
http://msdn.microsoft.com/en-us/library/aa711641(v=vs.71).aspx).

How about "LogicalLine"?

Your sentence "the key property" is pure gold, btw.

================
Comment at: lib/Format/ContinuationParser.h:58-59
@@ +57,4 @@
+
+  /// \brief The \c Token comprising this continuation.
+  std::vector<FormatToken> Tokens;
+
----------------
Prefer SmallVector

================
Comment at: lib/Format/ContinuationParser.cpp:9-12
@@ +8,6 @@
+//===----------------------------------------------------------------------===//
+//
+//  This is EXPERIMENTAL code under heavy development. It is not in a state 
yet,
+//  where it can be used to format real code.
+//
+//===----------------------------------------------------------------------===//
----------------
Proper file header needed here too.

================
Comment at: lib/Format/Format.cpp:45
@@ +44,3 @@
+    State.Indent.push_back(Cont.Level * 2 + 4);
+    for (unsigned i = 1, e = Cont.Tokens.size(); i != e; ++i) {
+      bool InsertNewLine = Cont.Tokens[i].NewlinesBefore > 0;
----------------
http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly

I and N are the canonical names for integer loop variables (I and E moreso for 
iterators).

also, comment why you are iterating starting at 1.

================
Comment at: lib/Format/Format.cpp:58
@@ +57,3 @@
+private:
+  /// \brief The current state when indenting a continuation.
+  ///
----------------
Ugh. Yeah, "logical line" would be a lot better.

================
Comment at: lib/Format/Format.cpp:86
@@ +85,3 @@
+
+  // Append the token at 'Index' to the IndentState 'State'.
+  void addToken(unsigned Index, bool Newline, bool DryRun, IndentState &State) 
{
----------------
Doxygenify


http://llvm-reviews.chandlerc.com/D80
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Reply via email to