cor3ntin added inline comments.
================ Comment at: clang/lib/Lex/Lexer.cpp:3386 + + if (!Result || CurPtr - StartPtr == (ptrdiff_t)(Buffer.size() + 4)) StartPtr = CurPtr; ---------------- tahonermann wrote: > cor3ntin wrote: > > tahonermann wrote: > > > cor3ntin wrote: > > > > cor3ntin wrote: > > > > > tahonermann wrote: > > > > > > This is a bit of a tangent, but under what circumstances would > > > > > > `CurPtr - StartPtr` not be equal to `Buffer.size() + 4`? Actually, > > > > > > I'm not sure that +4 is correct. It looks like `StartPtr` is > > > > > > expected to point to `N` at the beginning of the function, so the > > > > > > initial `\` is not included in the range. If `N` isn't seen, the > > > > > > function returns early. Likewise, if either of the `{` or `}` > > > > > > delimiters is not found, the function returns early. > > > > > > > > > > > > I think the goal of this code is to skip the `getAndAdvanceChar()` > > > > > > machinery when the buffer is being claimed (no need to parse UCNs > > > > > > or trigraphs in the character name), but it looks like, > > > > > > particularly with this DR, that we might always be able to do that. > > > > > afaict, 4 is correct here because we are one past-the-end. > > > > > I do however agree that this whole pattern which is used a few times > > > > > is probably unnecessary, i do think it would be good to investigate. > > > > > Not in this patch though, imo > > > > I looked into it, I'm sure we could improve but not easily, > > > > `getAndAdvanceChar` does set some flags on the token in the presence of > > > > trigraphs/line splicing and we need those flags to be set, this is the > > > > reason we do need to call that method. > > > > It's not super efficient but it's such an edge case... I'd rather not > > > > touch that now > > > My concern is that, as is, the code always takes the `else` branch > > > (except when `Result` is non-null). Assuming a buffer containing "X", the > > > pointers are arranged as follows (where `$` is one past the end). > > > \ N { X } $ > > > | | `- CurPtr > > > | `- Buffer > > > `- StartPtr > > > `CurPtr - StartPtr` is 4, but `Buffer.size() + 4` is 5 (`Buffer.size()` > > > is 1 in this case). > > > > > > I think there might be an easy test to see if this is working as > > > intended. If it isn't, I would expect a diagnostic to be issued if > > > trigraphs are enabled and the buffer contains a trigraph sequence. > > > Something like: > > > \N{LOTUS??>} > > I can try to add tests > > > > > My concern is that, as is, the code always takes the else branch (except > > > when Result is non-null). > > Yes, the if branch sole purpose is to set some state in Result. > > > > At the start of the function, StartPtr points to `\` > > And I'll leave a comment, maybe that will clear up future confusions > > > > There may be a potential refactor here, which is to have > > `getAndAdvanceChar` take a `bool & ShouldCleanupToken` parameter instead of > > a token so that we don't have to do that dance, but it's a bit outside of > > the scope of this patch... > > At the start of the function, StartPtr points to `\` > > It doesn't look like it does. The first use of `StartPtr` is at line 3314 and > it expects to read `N`: > 3314: char C = getCharAndSize(StartPtr, CharSize); > 3315: assert(C == 'N' && "expected \\N{...}"); Yes, you are right. There was a bug in \u too, probably has been there for ages. It's unfortunately not testable, any incorrect value would call getCharAndSize which is going to do the right thing. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D138861/new/ https://reviews.llvm.org/D138861 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits