Re: [GSoC’11] Lexing and parsing

BlazingWhitester Wed, 23 Mar 2011 02:38:58 -0700

On 2011-03-23 00:27:51 +0200, Ilya Pupatenko said:

Hi,
First of all, I want to be polite so I have to introduce myself (youcan skip this paragraph if you feel tired of newcomer-students’ posts).My name is Ilya, I’m a Master student of IT department of NovosibirskState University (Novosibirsk, Russia). In Soviet period Novosibirskbecame on of the most important science center in the country and nowthere are very close relations between University and Academy ofScience. That’s why it’s difficult and very interesting to study here.But I’m not planning to study or work this summer, so I’ll be able towork (nearly) full time on GSoC project. My primary specialization isseismic tomography inverse problems, but I’m also interested inprogramming language implementation and compilation theory. I have goodknowledge of C++ and C# languages and “intermediate” knowledge of Dlanguage, knowledge of compilation theory, some experience inimplementing lexers, parsers and translators, basic knowledge oflex/yacc/antlr and some knowledge of Boost.Spirit library. I’m not anexpert in D now, but I willing to learn and to solve difficult tasks,that’s why I decided to apply on the GSoC.
I’m still working on my proposal (on task “Lexing and Parsing”), but Iwant to write some general ideas and ask some questions.
1. It is said that “it is possible to write a highly-integratedlexer/perser generator in D without resorting to additional tools”. AsI understand, the library should allow programmer to write grammardirectly in D (ideally, the syntax should be somehow similar to EBNF)and the resulting parser will be generated by D compiler whilecompiling the program. This method allows integration of parsing in Dcode; it can make code simpler and even sometimes more efficient.There is a library for C++ (named Boost.Spirit) that follows the sameidea. It provide (probably not ideal but very nice) “EBNF-like” syntaxto write a grammar, it’s quite powerful, fast and flexible. There arethree parts in this library (actually there are 4 parts but we’re notinterested in Spirit.Classic now):
• Spirit.Qi (parser library that allows to build recursive descent parsers);
• Spirit.Karma (generator library);
• Spirit.Lex (library usable to create tokenizers).
The Spirit library uses “C++ template black magic” heavily (forexample, via Boost.Fusion). But D has greater metaprogrammingabilities, so it is possible to implement the same functionality ineasier and “clean” way.So, the question is: is it a good idea if at least parser libraryarchitecture will be somewhat similar to Spirit one? Of course it isnot about “blind” copying; but creating architecture for such a bigsystem completely from scratch is quite difficult indeed. If to beexact, I like an idea of parser attributes, I like the way semanticactions are described, and the “auto-rules” seems really useful.
2. Boost.Spirit is really large and complicated library. And I doubtthat it is possible to implement library of comparable level in threemonths. That’s why it is extremely important to have a plan (whichfeatures should be implemented and how much time will it take). I’mstill working on it but I have some preliminary questions.Should I have a library that is proposed and accepted in Phobos beforethe end of GSoC? Or there is no such strict timeframe and I can proposea library when all features I want to see are implemented and testedwell?And another question. Is it ok to concentrate first on parser libraryand then “move” to other parts? Of course I can choose another part tostart work on, but it seems to me that parser is most useful andinteresting part.
3. Finally, what will be next. I’ll try to make a plan (which partsshould be implemented and when). Then I guess I need to describe theproposed architecture in more details, and probably provide some usageexamples(?). Is it ok, if I publish ideas there to get reviews?
Anyway, I’ll need some time to work on it.

Ilya.
P.S. The funny thing is that I found minor bug in Phobos (#5736) whiletrying (just for fun) to implement some tiny part of Spirit in D.Submitting bugs seems to be important part of the task too.

Mimicking spirit might not be a good idea. It looks sort of like BNFgrammar, but because of operator abuse, there is just so many noise.A better idea might be using D compile time function evaluation toparse strings with grammars

Re: [GSoC’11] Lexing and parsing

Reply via email to