> Regarding the regex ctags version of the parser vs Geany's version of the 
> parser: we could use the regex ctags version, I was just thinking that since 
> we have the hand-written version in Geany already, it might be a base for a 
> hand-written parser that could eventually be submitted upstream so I kept the 
> `geany_` parser. Hand-written parsers tend to offer better flexibility in 
> parsing and are much faster than regex parsers. But before such a parser 
> could be submitted upstream, it would have to offer all the functionality the 
> current regex parser offers.

I think the custom parser is currently a bit messy and could use some 
restructuring, but your point is valid.
Right now, I think the only real advantage of the regex version of the parser 
is its readability, but it doesn't seem to be really leveraging the full power 
of regexes -- it is rather simple and can probably be translated to "plain C" 
easily.
(Also, at first glance, it seems that those regexes aren't too good either; for 
example, I believe the second one will match `functionality = 42` too.)

Re: speed, I see four "levels" in which the parser can be implemented:
1. Compare characters one by one
2. `strncmp()` and `strstr()`
3. **`sscanf()`** :bulb:
4. Regular expressions

I think the regex parser could be easily re-implemented using sscanf() as a 
faster alternative to regex, so if that's an option I think it'd be an elegant 
solution -- readable, efficient, and less prone to errors than options 1 and 2.

Something like
```c
if (sscanf(line, " function [%*[^]%]] = %[A-Za-z0-9_]", buffer) == 1) ...
if (sscanf(line, " function%*[ \t]%*[A-Za-z0-9_] = %[A-Za-z0-9_]", buffer) == 
1) ...
```
etc.
(where `" "` matches zero or more whitespace chars, `"%*[ \t]"` matches one or 
more spaces/tabs, `"%*[^]%]"` matches anything but `]` and `%`, etc -- it's not 
incredibly readable, but it's fast.)

So, what do you think?  Would `sscanf()` be fast enough, or better to keep 
matching individual chars and substrings?

> Probably could be done by checking if after
> 
> ```
> p=(const unsigned char*) strstr ((const char*) line, "struct");
> ```
> 
> `p-1` and `p+6` are not alnum (plus all the necessary range checks).

That still won't ignore words in strings (and maybe other corner cases).
Honestly I think I'd entirely ditch parsing structs; knowing that a certain 
variable at a certain point in the program is a struct isn't really that 
relevant, and universal-ctags doesn't do it anyway.  Class parsing would be 
more useful.
For a similar reason, I'd avoid parsing all variables as universal-ctags does; 
having a list with EVERY variable assignment in EVERY function in the script 
seems excessive.  (However, it might be a good idea to list `global` and 
`persistent` variables.)

-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3358#issuecomment-1369307109
You are receiving this because you are subscribed to this thread.

Message ID: <geany/geany/pull/3358/[email protected]>

Reply via email to