[PATCH] D128486: [pseudo] Add error-recovery framework & brace-based recovery

Haojian Wu via Phabricator via cfe-commits Wed, 29 Jun 2022 03:43:46 -0700

hokein added a comment.

the patch looks a good start to me, some initial comments (mostly around the 
recovery part).




================
Comment at: clang-tools-extra/pseudo/include/clang-pseudo/GLR.h:150
+// OldHeads is the parse state at TokenIndex.
+// This function consumes consumes zero or more tokens (advancing TokenIndex),
+// and places any recovery states created in NewHeads.
----------------
If I read it correctly, consuming zero token is consider failure of the 
function, right?


================
Comment at: clang-tools-extra/pseudo/include/clang-pseudo/grammar/Grammar.h:134
+  uint8_t RecoveryIndex = -1;
+  RecoveryStrategy Recovery = RecoveryStrategy::None;
+
----------------
So each rule will have at most 1 recovery-strategy (it is fine for the initial 
version), but I think in the future we want more (probably we need to change 
the Sequence to an array of `{SymbolID, RevoeryStrategy}`).

```
selection-statement := IF CONSTEXPR_opt ( init-statement_opt condition ) 
statement ELSE statement
```
We might want different recoveries in `( . init-statement_opt condition )` 
`(init-statement_opt condition) . statement`, `ELSE . statement`.





================
Comment at: clang-tools-extra/pseudo/include/clang-pseudo/grammar/LRTable.h:248
+  // A given state has [RecoveryOffset[S], RecoveryOffset[S+1]).
+  std::vector<uint32_t> RecoveryOffset;
+  std::vector<Recovery> Recoveries;
----------------
I see the motivation of the `OffsetTable` structure now, this would come as a 
follow-up to simplify the `ReduceOffset` and `RecoveryOffset`, right?


================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:29
+
+llvm::Optional<unsigned>
+findRecoveryEndpoint(RecoveryStrategy Strategy,
----------------
nit: unsigned => `Token::Index`


================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:44
+
+void glrRecover(llvm::ArrayRef<const GSS::Node *> OldHeads,
+                unsigned &TokenIndex, const TokenStream &Tokens,
----------------
The `GLR.cpp` file is growing significantly, I think the recovery part is large 
enough to be lived in a separate file `GLRRecovery.cpp`, but the declaration 
can still be in the `GLR.h`.


================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:68
+    // one arbitrarily.
+    std::vector<const ForestNode *> Path;
+  };
----------------
nit: maybe name it `Parses` or `PartialParses`. Path make me think this is a 
patch of GSS nodes.


================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:83
+  //    expr := type ( . expr )   - (up) we should recover this outer expr
+  //    expr := . type ( expr )   - (up+left) we should not recover this type!
+  //
----------------
I think you're right -- I thought the first GSS node with a recovery state we 
encounter during the Walkup  state is the node we want. 

This example is a little confusing (it still matches our previous mental 
model), I didn't get it until we discussed offline. I think the following 
example is clearer 

```
parsing the text `if (true) else ?`

IfStmt := if (stmt) else . stmt - which we're currently parsing
IfStmt := if (.stmt) else stmt  - (left) the first recovery GSS node, should 
not recover from this
IfStmt := . if (stmt) else stmt - (up), we should recover from this
```

I also think it is worth to add it into the test.


================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:88
+  // For now, we have to take this into account when defining recovery rules.
+  // FIXME: find a more satisfying way to avoid such false recovery.
+  std::vector<const ForestNode *> Path;
----------------
I can't think of a better solution other than a search-based one (would like to 
think more about it).

Currently, we find all recovery options by traversing the whole GSS, and do a 
post-filter (based on the Start, and End). I might do both in the DFS (which 
will save some cost of traversing), the DFS will give us the best recovery 
options, and then we build the GSS node, and forest node.  But up to you.


================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:111
+  };
+  for (auto *N : llvm::reverse(OldHeads))
+    DFS(N, TokenIndex, DFS);
----------------
any particular reason why we iterate the OldHeads in a reversed order?


================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:130
+
+  assert(NewHeads.empty()); // We may repeatedly populate and clear it.
+  llvm::Optional<Token::Range> RecoveryRange;
----------------
nit: might be clearer to move the assertion to the beginning of the function.


================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:174
+                            << " strategies\n");
+    ++TokenIndex;
+  }
----------------
Advancing the TokenIndex here seems a little surprising and doesn't match what 
the comment says (` On failure, NewHeads is empty and TokenIndex is 
unchanged.`). I think this should be done in caller side.


================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:64
+    // These represent the partial parse that is being discarded.
+    // They should become the children of the opaque recovery node.
+    //
----------------
this is not implemented, right? Add a FIXME?


================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:91
+  llvm::DenseSet<const GSS::Node *> Seen;
+  auto DFS = [&](const GSS::Node *N, Token::Index NextTok, auto &DFS) {
+    if (!Seen.insert(N).second)
----------------
nit: Walkup seems a bit clearer than DFS. 


================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:133
+  for (const PlaceholderRecovery &Option : Options) {
+    // If this starts further right than options we've already found, then
+    // we'll never find anything better. Skip computing End for the rest.
----------------
`further right` should be `further left`, right?


================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:149
+      if (RecoveryRange->End > *End)
+        NewHeads.clear();
+    }
----------------
If we find a better recovery option, we're throwing all the newly-built heads 
and forest nodes, this seems wasteful. I think we can avoid it by first finding 
the best solutions and creating new heads and gss nodes for them.


================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:150
+        NewHeads.clear();
+    }
+    // Create recovery nodes and heads for them in the GSS. These may be
----------------
The Line135-Line150 code looks like a good candidate for  an `evaluateOption` 
function. 


================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:590
+    if (NextHeads.empty()) {
+      glrRecover(Heads, I, Tokens, Params, NextHeads);
+      if (NextHeads.empty())
----------------
currently, it is fine for bracket-based recovery. In the future, we will need 
to run a force reduction for `Heads` regardless of the lookahead token before 
running the glrRecover, add a FIXME?


================
Comment at: clang-tools-extra/pseudo/lib/GLR.cpp:591
+      glrRecover(Heads, I, Tokens, Params, NextHeads);
+      if (NextHeads.empty())
+        // FIXME: Ensure the `_ := start-symbol` rules have a fallback
----------------
If the glrRecover returns a bool indicating whether we're able to recover, then 
the code is simpler:

```
if (NextHeads.empty() && !glrRecover(...))
  return Params.Forest.createOpaque(...);
```



================
Comment at: clang-tools-extra/pseudo/unittests/GLRTest.cpp:382
+  auto *Question1 = &Arena.createTerminal(tok::question, 1);
+  const auto *Root = GSStack.addNode(0, nullptr, {});
+  const auto *OpenedBraces = GSStack.addNode(1, LBrace, {Root});
----------------
nit: for GSS nodes, I will name them like `GSSNode<StateID>`, it is less 
describable, but it can be easily distinguished from other forest nodes, and 
match the GSS described in the comment, which I think it is clearer.


================
Comment at: clang-tools-extra/pseudo/unittests/GLRTest.cpp:425
+  // Innermost brace is unmatched, to test fallback to next brace.
+  TokenStream Tokens = cook(lex("{ { { ? ? } }", LOptions), LOptions);
+  Tokens.tokens()[0].Pair = 5;
----------------
I suppose there is an extra `?` in the string literal, the index of the 
brackets (`4`, `5`) used below doesn't match the string literal here.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128486/new/

https://reviews.llvm.org/D128486

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D128486: [pseudo] Add error-recovery framework & brace-based recovery

Reply via email to