Re: [Maria-developers] SQL Parser

Antony T Curtis Sat, 09 May 2009 09:19:07 -0700


On 9 May 2009, at 00:35, Michael Widenius wrote:

Hi!

I fixed the maria-developers address so that everyone else can follow
and participate in the discussion.

jim> jim> Monty,
jim> jim>
jim> jim> My T-SQL parser is now complete and a number of people areusingjim> jim> the Java, C# and C version. Would you like to have a lookat it?
jim>
jim> Monty> Yes, I would be intrested to know more about this!
jim>
jim> OK - I have my demo online now. If you visit http://www.temporal-wave.com
jim>
jim> and take the Online Demos->T-SQL menu option you can enter T-SQL
jim> batches and see the parse results. Over the next day or so, I am
jim> adding demos of my C# and VB.Net parsers, so if the site goesup and
jim> down then just wait a few minutes and try again :-) Once I have
jim> evaluated some of the things you mention below then I will letyou
jim> have a scan at the source code.

ok. I will take a look as soon as I have got through the emails that
has batched up during the last 3 weeks.
jim> jim> I think that I could steal a lot of common syntax fromthis for ajim> new MySQL parser. There are 1100 regression tests culledfrom thejim> T-SQL language spec and user examples; on my Linux systemthe parse
jim>     time, including building and walking the AST is:
jim>     jim>
jim>     jim> jimi(50_64)-clean: wc -l *.sql
jim>     jim> ...
jim>     jim> 12923 total
jim>     jim>
jim>     jim> jimi(50_64)-clean: time tsqlc . >out
jim> jim> 0.15user 0.19system 0:00.34elapsed 100%CPU (0avgtext+0avgdata
jim>     0maxresident)k
jim>     jim> 0inputs+232outputs (0major+102772minor)pagefaults 0swaps
jim>
jim> jim> Which includes the time taken to read each of the 1100files fromjim> the file system and to print a message for each saying theparse was
jim>     successful of course.
jim> jim> Anyway, if you want to take a look at the code, justlet me know :-)
jim>
jim>     Monty> The big question is how much work it would be to
jim>            replace the MySQLparser with your T-SQL parser.

jim> Indeed. Obviously syntactical changes are just slog really and
jim> most of that is ensuring that there is enough coverage inregression
jim> tests to exercise all the syntactic differences between T-SQL and
jim> MySql, or rather, all the syntax of MySQL. A lot of commonalitybutjim> still a fair bit of work. What is the definitive languagereferencejim> for MySQL Marina? I can locate reference manuals and ANSI specsofjim> course, but if you have one particular document in mind, thenit wouldjim> be useful to work off that so I can evaluate the syntacticalchanges I
jim> would need to make.

Unfortunately there is no other spec than the sql_yacc.yy file that
comes with the MySQL source code.

I have a tool somewhere which can build .dot files from the .yy filesin the same way how ANTLRworks draws its syntax diagrams.... Copies ofthe tool exist somewhere on the old MySQL internal repositories but Iam sure I have a backup copy somewhere.

jim>
jim>     I would like to do this in two steps:
jim>
jim> a) Replace the mysql_yacc.yy file, with as few changes aspossible in
jim>       the rest of the MySQL source.
jim>
jim>
jim> ANTLR is essentially the same gestalt as bison: one or moresource files define the grammar and contain actions in C, triggeredat the parse points; These grammar definition files are convertedinto C source by invoking an external program (in the case of ANTLRit is a Java program) and the resulting .c/.h files are compiledwith the rest of the code.
jim>
jim> I wrote the C output to be machine independent. The idea beingthat you can generate the .c files on one machine, check that sourcecode in and use it on any other machine without re-generating. Thiscan be useful when targeting systems without Java machines (or onesthat don't work very well). So, like many other systems you can havea 'build completely from scratch' option and a 'use pre-generatedfiles option'.
jim>
jim> In the main then, I suspect that the bigger changes to theMySQL base would be build changes. I say this based upon theassumption that any new parser should build the same AST for a queryas the yacc based version does. This means the new parser makes thesame calls as the old one and thus the code external to the parserwould probably require little, if anything in the way of change (butsee b below).
That sounds very good.

Not possible to build the same AST as the yacc/bison based parserbecause there is no AST.

The execution structures are directly built during the parsing phase.

jim> b) Optimze MySQL to be even faster, with the new parser ata base.
jim>
jim> Obviously there are two aspects, being the speed of the parseritself, which I don't think will be an issue, and then what kind oftransformations are done on the resulting AST for optimizationpurposes. If we use exactly the same AST as MySQL does now, thenimproving MySQL performance is the same problem domain as it is now.However, perhaps there is opportunity for a new parser to providemore information about the query that would aid optimization.Possibilities here include modifications to the AST, externalinformation structures that current code can use when walking theAST, or even a completely new AST. The latter would probably havequite an effect on the front end of the parse->AST->execution phaseof course, so I would think that the first step is to try to achieveyour step a with the existing AST as output.

Agree

You will have to implement an all-new AST which then can be used to'compile' the execution structures.(this was a minor sticking part of an earlier attempt to replace theparser - there were objections to building an AST and adding anadditional stage to the execution)

jim> Of course maintaining and changing an ANTLR based grammar issomewhat easier than changing bison/yacc, if for no other reasonthan ANTLR doesn't need much in the way of code assists for correctparsing, whereas Bison quite often does.
jim> The a) option is quite important to be able to do easymerges with the
jim>     main MySQL tree.
jim>
jim> I have time this week to take a look at this. Mechanically Iknow it is easy enough, but I need to look at the calls that aremade from the current yacc/bison parser and see if there is anythingin there that would be awkward. The only things I could think ofwould be if the existing code relies on internal structures of bisonrather than more generic structures. That should be easy to define.
The MySQL code doesn't rely on bison structures at all (as far as Iknow).

AFAIK, MySQL does not use any Bison internals as such. A past projectfrom many years ago, I put into MySQL a PCCTS (pre-ANTLR) LL parser,basically keeping MySQL's custom lexer but it generated Token objectsinstead suitable for the PCCTS generated parser. The project was nevercompleted because integration of Stored Procedures became theobjective for MySQL 5.0 instead of implementation of a new parser.

I have asked Stewart Smith to see if he can help make public that oldsource repository as a Bazaar repository as people may be able tolearn from it how to replace the parser without major pain.

Did you get time to do the analyse?

jim>
jim> A few other things that may be of note:
jim>
jim> 1) The ANTLR generated code is free threading so long as anyaction code that you place withing the parser is also free threading.
In other words, same as with bison

The ANTLR runtime does, of course, require Java. So developers whowish to extend the grammar do have to have the Java Runtime installedas well as the ANTLR jars.

jim> 2) The only system calls it uses is {m|c|re}alloc/free, whichthe C runtime component can re-direct (say to the MySQL overrides).For completely free threading, those calls also need to be freethreading and so they are the only limit to threading of course.
Sounds perfect.

I would *really* like to see a new parser for MariaDB. Anything that
we can do to help get this done ?


I too am interested to help if at all possible.

Regards,
Antony,


_______________________________________________
Mailing list: https://launchpad.net/~maria-developers
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~maria-developers
More help   : https://help.launchpad.net/ListHelp

Re: [Maria-developers] SQL Parser

Reply via email to