On Wed, 29 Feb 2012 21:30:57 +0100, Timon Gehr <[email protected]> wrote:
On 02/29/2012 09:03 PM, Martin Nowak wrote:
On Wed, 29 Feb 2012 20:30:43 +0100, Timon Gehr <[email protected]>
wrote:
On 02/29/2012 07:28 PM, Martin Nowak wrote:
On Wed, 29 Feb 2012 17:41:19 +0100, Timon Gehr <[email protected]>
wrote:
On 02/28/2012 07:46 PM, Martin Nowak wrote:
https://gist.github.com/1255439 - lexer generator
https://gist.github.com/1262321 - complete and fast D lexer
Well, it is slower at lexing than DMD at parsing. What is the
bottleneck?
No, it's as fast as dmd's lexer.
Writing the tokens to stdout takes a lot of time though.
Just disable the "writeln(tok);" in the main loop.
I did that.
Interesting, I've commented it out https://gist.github.com/1262321#L1559
and get the following.
<<<
PHOBOS=~/Code/D/DPL/phobos
mkdir test_lexer
cd test_lexer
curl https://raw.github.com/gist/1255439/lexer.d > lexer.d
curl https://raw.github.com/gist/1262321/dlexer.d > dlexer.d
curl https://raw.github.com/gist/1262321/entity.d > entity.d
dmd -O -release -inline dlexer lexer entity
wc -l ${PHOBOS}/std/*.d
time ./dlexer ${PHOBOS}/std/*.d
./dlexer ${PHOBOS}/std/*.d 0.21s user 0.00s system 99% cpu 0.211 total
I get 0.160s for lexing using your lexer.
Parsing the same file with DMDs parser takes 0.155 seconds. The
difference grows with larger files.
Mmh, I've retested and you're right dmd's lexer is about 2x faster.
The main overhead stems from using ranges and enforce.
Quick profiling shows that 25% is spent in popFront and std.utf.stride.
Last time I worked on this I rewrote std.utf.decode to be much faster.
But utf characters are still "decoded" twice, once for front
and then again for popFront. Also stride uses table lookup and
can't be inlined.
If switch tables were implemented on x64 one could use them for
integral ElementType.