Columban: Many thanks for your guidance.
There is a Jal IDE called PicShell, written in Python. It had a rudimentary Jal parser to do the extraction of variables, etc, to build a tree pane like Geany. (It uses the scintilla editor). I rewrote the parser to use regular expressions to get the data, so I know how to do it. In Geany, I didn't know where to start. Once I understand how Geany works, I'll play around. Then I can worry about submitting it for possible inclusion into Geany, although I'm not sure how wide an audience it will appeal to. I'll be back with more questions, I'm sure! -- Larry Bradley Orleans (Ottawa) Canada On Wed, 2014-01-29 at 18:34 +0100, Colomban Wendling wrote: > Hi, > > Le 29/01/2014 17:37, Larry Bradley a écrit : > > [...] > > > > I have the geany 1.23 source, and I've actually make some changes to > > the VHDL scintilla lexer and filetypes.vHDL to handle folding and > > syntax highlighting properly. > > You should use the development version (Git repository), so your changes > would be easier to merge later. > > > However, I would like to do a better job of supporting Jal. > > First of all, you should take a look at the file named HACKING in the > source tree. It contains many generic and specific guidance how to hack > on Geany source, and has a specific section for new filetypes. > > > In particular, I would like the symbol tree to be able to show the > > variables and constants defined in a Jal program. Using the VHDL > > filetype, Geany shows the functions and procedures (I did nothing to > > cause this to happen), but not the variables. > > > > Only some filetypes actually display variables. Basic, for example, > > does, while Pascal does not. > > The symbols are extracted with a CTags parser, e.g. > tagmanager/ctags/vhdl.c. Whether or not a particular type of symbol > appears in Geany depends on basically two things: > > 1) the ability of the relevant parser to generate "tags" for those symbols; > > 2) whether or not the type of those generated tags is mapped to a > category displayed in the symbols list. > > First point obviously requires the parser to be tuned to handle a > particular thing. The second depends both on what type the parser > reports for the tags, and whether this type is mapped to something for > this language in src/symbols.c:add_top_level_items(). > > > I've no problems with making changes to the Geany code, but I've no > > idea where to start with the display of variables and constants. The > > scintilla lexers that I've seen, and the scintilla documentation do > > not make it really obvious how one writes a lexer. > > Scintilla lexers do not generate tags, this is CTags parsers. > > The true difference between those in how they work is that the goal of a > Scintilla lexer is to only properly highlight the code, which most > generally only require basic knowledge of the syntax (e.g. what is a > string, a comment, etc.) -- basically, only the first step of the > general language understanding is required: identifying tokens. Having > a very tolerant Scintilla lexer is a good thing, since it's definitely > meant to highlight a document during modification. > > On the other hand, since the CTags parser has to extract particular > information from the data, it has to understand some parts. In general, > this requires the first step (dividing into tokens), although sometimes > only very basic differentiation is required [1]; but also the second > step: understanding what those token actually mean to some extent. > Whether or not it has to understand the whole language or not depends on > how the language is constructed and how clever the programmer of that > parser is to find tricks. For example, a language that use keywords to > introduce everything the parser want to extract (PHP or Python pretty > much fit) can pretty much simply search for those keywords and start > extracting the relevant information from there and not care much for > what is in-between. On the other hand, for languages with a more "free" > syntax (like C, C++ and other crazy languages :), the parser may have to > care more just to be able to find what is interesting (e.g. one could > imagine a C or C++ parser to cut the input in statements, and then > analyze those statement content). > > In practice however, one will generally take as basis an existing parser > or lexer for a language similar to the one she want to support. > > For writing your Scintilla lexer, pick one for a similar language (here > you picked VHDL IIUC), copy it and modify it. If the language in > questing really only have small differences with an existing one, one > might even simply tweak an existing lexer to handle both languages -- > but this should be done with caution no to render things hard to follow, > and should only be used for very similar languages. > Note that in the context of a Scintilla lexer, "very similar" means more > what syntactic elements exist and what is their syntax (comments, > strings, etc.) than how the language works. For example C, C++, Java, > JavaScript and a few other all use the same lexer, because most of their > syntactic elements are the same. > Also, Scintilla is a separate project, and we prefer new lexers to be > integrated to it before we add them, so we don't diverge. But don't > worry, Scintilla easily accept new lexers. > > For the CTags parser, it's less commonly a good idea to have one single > parser for different languages, because generally the changes are larger > -- unless of course one language is a perfect superscript or subscript > of another one. > There are 2 types of CTags parsers: regular-expression based parser, > and plain C ones. > > 1) Regular expression parsers are quite simple, and simply consist of a > set of line-based regular expressions that extract the tags. These are > limited (impossible to really handle multi-line constructs like comments > or multi-line strings), but really simple. > > 2) Plain C parsers are more complex, but can handle anything the > programmer can handle. They are just normal C code reading the data and > handling it in any appropriate manner. > > Ah, and don't take any example on the C parser (c.c) -- don't even look > at it if you want don't want to become crazy ;) > If you want a nice complete (and complex) parser for a relatively easy > language, you can look at the ones for PHP and Rust. > > > > Anyway, don't hesitate to ask any further question you have. > > Regards, > Colomban > > > [1] e.g. it's generally not important to differentiate an identifier > from a number constant, because in most languages they are used the > same, and if a number appears where an identifier is expected it only > means malformed input. > _______________________________________________ > Devel mailing list > Devel@lists.geany.org > https://lists.geany.org/cgi-bin/mailman/listinfo/devel
_______________________________________________ Devel mailing list Devel@lists.geany.org https://lists.geany.org/cgi-bin/mailman/listinfo/devel