Re: [HACKERS] scanner/parser minimization

2013-03-14 Thread Heikki Linnakangas
On 13.03.2013 10:50, Simon Riggs wrote: On 2 March 2013 18:47, Heikki Linnakangashlinnakan...@vmware.com wrote: Interestingly, the yy_transition array generated by flex used to be much smaller: 8.3: 22072 elements 8.4: 62623 elements master: 64535 elements The big jump between 8.3 and 8.4

Re: [HACKERS] scanner/parser minimization

2013-03-14 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes: I hear no objection, so committed. (after fixing some small bugs in the patch, and adding some comments) Please keep psqlscan.l in sync with scan.l. regards, tom lane -- Sent via pgsql-hackers mailing list

Re: [HACKERS] scanner/parser minimization

2013-03-13 Thread Simon Riggs
On 2 March 2013 18:47, Heikki Linnakangas hlinnakan...@vmware.com wrote: Uh ... no. I haven't looked into why the flex tables are so large, but this theory is just wrong. See ScanKeywordLookup(). Interestingly, the yy_transition array generated by flex used to be much smaller: 8.3:

Re: [HACKERS] scanner/parser minimization

2013-03-08 Thread Bruce Momjian
On Thu, Feb 28, 2013 at 04:09:11PM -0500, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: A whole lot of those state transitions are attributable to states which have separate transitions for each of many keywords. Yeah, that's no surprise. The idea that's been in the back of

Re: [HACKERS] scanner/parser minimization

2013-03-02 Thread Greg Stark
Regarding yytransition I think the problem is we're using flex to implement keyword recognition which is usually not what it's used for. Usually people use flex to handle syntax things like quoting and numeric formats. All identifiers are handled by flex as equivalent. Then the last step in the

Re: [HACKERS] scanner/parser minimization

2013-03-02 Thread Tom Lane
Greg Stark st...@mit.edu writes: Regarding yytransition I think the problem is we're using flex to implement keyword recognition which is usually not what it's used for. Usually people use flex to handle syntax things like quoting and numeric formats. All identifiers are handled by flex as

Re: [HACKERS] scanner/parser minimization

2013-03-02 Thread Robert Haas
On Thu, Feb 28, 2013 at 4:09 PM, Tom Lane t...@sss.pgh.pa.us wrote: I believe however that it's possible to extract an idea of which tokens the parser believes it can see next at any given parse state. (I've seen code for this somewhere on the net, but am too lazy to go searching for it again

Re: [HACKERS] scanner/parser minimization

2013-03-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Thu, Feb 28, 2013 at 4:09 PM, Tom Lane t...@sss.pgh.pa.us wrote: I believe however that it's possible to extract an idea of which tokens the parser believes it can see next at any given parse state. (I've seen code for this somewhere on the net, but

Re: [HACKERS] scanner/parser minimization

2013-03-02 Thread Heikki Linnakangas
On 02.03.2013 17:09, Tom Lane wrote: Greg Starkst...@mit.edu writes: Regarding yytransition I think the problem is we're using flex to implement keyword recognition which is usually not what it's used for. Usually people use flex to handle syntax things like quoting and numeric formats. All

Re: [HACKERS] scanner/parser minimization

2013-03-01 Thread Peter Eisentraut
On 2/28/13 3:34 PM, Robert Haas wrote: It's possible to vastly reduce the size of the scanner output, and therefore of gram.c, by running flex with -Cf rather than -CF, which changes the table representation completely. I assume there is a sound performance reason why we don't do this, but

[HACKERS] scanner/parser minimization

2013-02-28 Thread Robert Haas
Today's b^Hdiscussion on materialized views reminded me that I spent a little bit of time looking at gram.y and thinking about what we might be able to do to reduce the amount of bloat it spits out. On my system, without debugging symbols, gram.o is 1019260 bytes. Using nm gram.o | sort | less

Re: [HACKERS] scanner/parser minimization

2013-02-28 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: A whole lot of those state transitions are attributable to states which have separate transitions for each of many keywords. Yeah, that's no surprise. The idea that's been in the back of my mind for awhile is to try to solve the problem at the lexer