Hi Dmitry, Brian, all,
Here's a scanner patch that I mentioned awhile ago, with a possible way to
work around the re2c EOF handling issues.
The primary change is to do a "manual scan" like I talked about in areas
that match large amounts and can contain NULL bytes (strings/comments, which
are now scanned faster too), as is done for inline HTML. I called it a
"diet" :-) because it removes my complicated string regex patterns from a
couple years ago, which doesn't make the .l file much smaller after adding
the manual scan code (easier to understand...?), but it does result in a
~34k reduction of 5.3's generated .c file...
This fixes Bug #46817, as well as a better, more proper fix for the older
Bug #42767, both related to ending comments.
Now inline HTML chunks aren't broken up when a tag starting with "s" is
encountered (<script> for JS, <span>, etc.), since it's unlikely to be a
long PHP <script> tag.
If an opening PHP <SCRIPT> tag was used with a capital "S", it was missed if
it wasn't the first thing scanned:
var_dump(token_get_all("HTML... <SCRIPT language=php>"));
Single-line comments with a Windows newline didn't include the full \r\n:
var_dump(token_get_all("<?php // Comment\r\n?>"));
Finally, part of the optimized scanning is that, for double quoted strings,
when the first variable is encountered (making it non-constant), the amount
that's been scanned up to that point is remembered, which can then be
skipped over (up to the variable) after returning the quote token.
Previously that initial part of the string was rescanned -- the cost
dependent on how far "into" the string the first var is.
I think that's about all -- I'll send another message if I forgot to
mention anything... Just wanted to send this along quick for to you guys to
look at or whatever. It was basically done last week, I just had to do a
couple finishing touches and verify that everything was OK.
http://realplain.com/php/scanner_diet.diff (Merged changes, but didn't test
yet.)
http://realplain.com/php/scanner_diet_5_3.diff
Thanks,
Matt
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php