Hi Jordan, Thank you for review.
On Tue, Jun 26, 2012 at 5:18 PM, Jordan Rose <[email protected]> wrote: > Hi, Dmitri. Sorry for not pre-commit reviewing. > > On Jun 26, 2012, at 13:39 , Dmitri Gribenko <[email protected]> wrote: > >> + /// Contains text value associated with a token. >> + const char *TextPtr1; >> + unsigned TextLen1; >> + >> + /// Contains text value associated with a token. >> + const char *TextPtr2; >> + unsigned TextLen2; > > Why not StringRefs? Per Doug's comment: > + /// Contains text value associated with a token. > + StringRef TextValue1; > + > + /// Contains text value associated with a token. > + StringRef TextValue2; > > It would be nice if Token were a POD type. Could we just store ptr/length for > TextValue1 and TextValue2, so that Token remains a POD? >> +std::string BriefParser::Parse() { >> + std::string FirstParagraph; >> + std::string Brief; >> + bool InFirstParagraph = true; >> + bool InBrief = false; >> + bool BriefDone = false; >> + >> + while (Tok.isNot(tok::eof)) { >> + if (Tok.is(tok::text)) { >> + if (InFirstParagraph) >> + FirstParagraph += Tok.getText(); >> + if (InBrief) >> + Brief += Tok.getText(); >> + ConsumeToken(); >> + continue; >> + } >> + >> + if (!BriefDone && Tok.is(tok::command) && Tok.getCommandName() == >> "brief") { >> + InBrief = true; >> + ConsumeToken(); >> + continue; >> + } >> + >> + if (Tok.is(tok::newline)) { >> + if (InFirstParagraph) >> + FirstParagraph += '\n'; >> + if (InBrief) >> + Brief += '\n'; >> + ConsumeToken(); >> + >> + if (Tok.is(tok::newline)) { >> + ConsumeToken(); >> + // We found a paragraph end. >> + InFirstParagraph = false; >> + if (InBrief) { >> + InBrief = false; >> + BriefDone = true; >> + } >> + } >> + continue; >> + } >> + >> + // We didn't handle this token, so just drop it. >> + ConsumeToken(); >> + } >> + >> + if (Brief.size() > 0) >> + return Brief; >> + >> + return FirstParagraph; >> +} > > I think you can do this with one string. Once you have at least one \brief > command, you don't need to track the first paragraph anymore. Thanks, this simplifies logic, r159247. > And is it really valid to have more than one \brief command, or can we bail > out of this loop early if we find a complete \brief? Although Doxygen docs state that multiple \brief paragraph would be merged, I cannot reproduce this behavior (it only uses the first one). >> +/// Return the one past end pointer for C comments. >> +/// Very dumb, does not handle escaped newlines or trigraphs. >> +const char *findCCommentEnd(const char *BufferPtr, const char *BufferEnd) { >> + for ( ; BufferPtr != BufferEnd; ++BufferPtr) { >> + if (*BufferPtr == '*') { >> + assert(BufferPtr + 1 != BufferEnd); >> + if (*(BufferPtr + 1) == '/') >> + return BufferPtr; >> + } >> + } >> + llvm_unreachable("buffer end hit before '*/' was seen"); >> +} > > What happens in a file with unterminated comments? What happens in a file > with unterminated comments where the very last character is '*'? > > (If the answer is "this function will never be called", I'm okay with that.) An unterminated comment should not be passed from lexer to anyone. I've added a test for that in r159267. >> + >> +void Lexer::lexCommentText(Token &T) { >> + assert(CommentState == LCS_InsideBCPLComment || >> + CommentState == LCS_InsideCComment); >> + >> + switch (State) { >> + case LS_Normal: >> + break; >> + case LS_VerbatimBlockFirstLine: >> + lexVerbatimBlockFirstLine(T); >> + return; >> + case LS_VerbatimBlockBody: >> + lexVerbatimBlockBody(T); >> + return; >> + case LS_HTMLOpenTag: >> + lexHTMLOpenTag(T); >> + return; >> + } > > This seems rather asymmetrical. Can we just factor normal mode into its own > function as well? Normal state for lexing text is the hot path, I wanted to avoid a function call here. I don't really have a preference, though, because I didn't do any performance measurements yet. >> + assert(State == LS_Normal); >> + >> + const char *TokenPtr = BufferPtr; >> + assert(TokenPtr < CommentEnd); >> + while (TokenPtr != CommentEnd) { >> + switch(*TokenPtr) { >> + case '\\': >> + case '@': { >> + TokenPtr++; >> + if (TokenPtr == CommentEnd) { >> + formTokenWithChars(T, TokenPtr, tok::text); >> + T.setText(StringRef(BufferPtr - T.getLength(), T.getLength())); >> + return; >> + } > > This looks funny. Can you just keep a "frame pointer" to where BufferPtr was > at the beginning of the function, rather than relying on it being updated by > formTokenWithChars and then counting backwards? I don't have a preference here. Done in r159269. > Also, when the token body is trivially extractable from the entire token > text, I'm not sure it's worth saving separately.) I think that it is useful because the extracted text is the semantic value of the token, which is used a lot in the parser and sema. I don't think it is a good idea to require them to strip extra characters -- that's what the lexer is for. >> + default: { >> + while (true) { >> + TokenPtr++; >> + if (TokenPtr == CommentEnd) >> + break; >> + char C = *TokenPtr; >> + if(C == '\n' || C == '\r' || >> + C == '\\' || C == '@' || C == '<') >> + break; >> + } >> + formTokenWithChars(T, TokenPtr, tok::text); >> + T.setText(StringRef(BufferPtr - T.getLength(), T.getLength())); >> + return; >> + } > > Should this function be the place where leading stars are removed in C-style > comments? We can remove them there, but I would prefer to do this while post-processing comment text. Dmitri -- main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if (j){printf("%d\n",i);}}} /*Dmitri Gribenko <[email protected]>*/ _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
