Uh, I find this review confusing. Do your (Jon) comments refer to the code above them? (more below)
On 3/16/26 4:03 PM, Jonathan Corbet wrote: > On Thu, 12 Mar 2026 15:54:25 +0100, Mauro Carvalho Chehab > <[email protected]> wrote: >> Handling C code purely using regular expressions doesn't work well. >> >> Add a C tokenizer to help doing it the right way. >> >> The tokenizer was written using as basis the Python re documentation >> tokenizer example from: >> https://docs.python.org/3/library/re.html#writing-a-tokenizer >> >> Signed-off-by: Mauro Carvalho Chehab <[email protected]> >> Message-ID: >> <c63ad36c81fe043e9e33ca55630414893f127413.1773074166.git.mchehab+hua...@kernel.org> >> Message-ID: >> <8541ffa469647db1a7154f274fb2d55b4c127dcb.1773326442.git.mchehab+hua...@kernel.org> > > This is a combined effort to review this patch and to try out "b4 review", > we'll see how it goes :). > >> diff --git a/tools/lib/python/kdoc/kdoc_re.py >> b/tools/lib/python/kdoc/kdoc_re.py >> index 085b89a4547c0..7bed4e9a88108 100644 >> --- a/tools/lib/python/kdoc/kdoc_re.py >> +++ b/tools/lib/python/kdoc/kdoc_re.py >> @@ -141,6 +141,240 @@ class KernRe: >> [ ... skip 4 lines ... ] >> + >> + @staticmethod >> + def __str__(val): >> + """Return the name of an enum value""" >> + return TokType._name_by_val.get(val, f"UNKNOWN({val})") >> + > > What is this class supposed to do? > >> [ ... skip 27 lines ... ] >> + _name_by_val = {v: k for k, v in dict(vars()).items() if isinstance(v, >> int)} >> + >> + # Dict to convert from string to an enum-like integer value. >> + _name_to_val = {k: v for v, k in _name_by_val.items()} >> + >> + @staticmethod > > This stuff strikes me as a bit overdone; _name_to_val is really just the > variable list for the class, right? > >> [ ... skip 30 lines ... ] >> + f"{self.brace_level}, {self.paren_level}, >> {self.bracket_level})" >> + >> +#: Tokens to parse C code. >> +TOKEN_LIST = [ >> + (CToken.COMMENT, r"//[^\n]*|/\*[\s\S]*?\*/"), >> + > > So these aren't "tokens", this is a list of regexes; how is it intended > to be used? > >> + (CToken.STRING, r'"(?:\\.|[^"\\])*"'), >> + (CToken.CHAR, r"'(?:\\.|[^'\\])'"), >> + >> + (CToken.NUMBER, r"0[xX][0-9a-fA-F]+[uUlL]*|0[0-7]+[uUlL]*|" > > How does "[\s\S]*" differ from plain old "*" ? > >> [ ... skip 15 lines ... ] >> + (CToken.STRUCT, r"\bstruct\b"), >> + (CToken.UNION, r"\bunion\b"), >> + (CToken.ENUM, r"\benum\b"), >> + (CToken.TYPEDEF, r"\bkinddef\b"), >> + >> + (CToken.NAME, r"[A-Za-z_][A-Za-z0-9_]*"), > > "-" and "!" never need to be escaped. > >> + >> + (CToken.SPACE, r"[\s]+"), >> + >> + (CToken.MISMATCH,r"."), >> +] >> + > > "kinddef" ? What does that refer to? > >> +#: Handle C continuation lines. >> +RE_CONT = KernRe(r"\\\n") >> + >> +RE_COMMENT_START = KernRe(r'/\*\s*') >> + > > Don't need the [brackets] here what brackets? > >> [ ... skip 6 lines ... ] >> + >> + When converted to string, it drops comments and handle public/private >> + values, respecting depth. >> + """ >> + >> + # This class is inspired and follows the basic concepts of: > > That seems weird, why don't you just initialize it here? I can't tell what that comments refers to. >> [ ... skip 14 lines ... ] >> + source = RE_CONT.sub("", source) >> + >> + brace_level = 0 >> + paren_level = 0 >> + bracket_level = 0 >> + > > Do you mean "iterator" here? Ditto. >> [ ... skip 33 lines ... ] >> + in this particular case, it makes sense, as we can pick the name >> + when matching a code via re_scanner(). >> + """ >> + global re_scanner >> + >> + if not re_scanner: > > Putting __init__() first is fairly standard, methinks. > >> [ ... skip 15 lines ... ] >> + >> + for tok in self.tokens: >> + if tok.kind == CToken.BEGIN: >> + show_stack.append(show_stack[-1]) >> + >> + elif tok.kind == CToken.END: > > I still don't understand why you do this here - this is all constant, right? > >> + prev = show_stack[-1] >> + if len(show_stack) > 1: >> + show_stack.pop() >> + >> + if not prev and show_stack[-1]: > > So you create a nice iterator structure, then just put it all together into a > list anyway? > -- ~Randy

