2009/4/30 Dmitry Stogov <dmi...@zend.com>: > Hi Matt, > > Does this patch fix EOF handling issues related to mmap()? (e.g. parsing of > files with size 4096, 8192, ...). Now we have two dirty fixes to handle them > correctly. > > The patch is quite big to understand it quickly. I'll probably take a look > on weekend. > > -ANY_CHAR [^\x00] > +ANY_CHAR [^] > > Is [^] a correct regular expression? > > Thanks. Dmitry. > > Matt Wilmas wrote: >> >> Hi Dmitry, Brian, all, >> >> Here's a scanner patch that I mentioned awhile ago, with a possible way to >> work around the re2c EOF handling issues. >> >> The primary change is to do a "manual scan" like I talked about in areas >> that match large amounts and can contain NULL bytes (strings/comments, which >> are now scanned faster too), as is done for inline HTML. I called it a >> "diet" :-) because it removes my complicated string regex patterns from a >> couple years ago, which doesn't make the .l file much smaller after adding >> the manual scan code (easier to understand...?), but it does result in a >> ~34k reduction of 5.3's generated .c file... >> >> This fixes Bug #46817, as well as a better, more proper fix for the older >> Bug #42767, both related to ending comments. >> >> Now inline HTML chunks aren't broken up when a tag starting with "s" is >> encountered (<script> for JS, <span>, etc.), since it's unlikely to be a >> long PHP <script> tag. >> >> If an opening PHP <SCRIPT> tag was used with a capital "S", it was missed >> if it wasn't the first thing scanned: >> >> var_dump(token_get_all("HTML... <SCRIPT language=php>")); >> >> Single-line comments with a Windows newline didn't include the full \r\n: >> >> var_dump(token_get_all("<?php // Comment\r\n?>")); >> >> Finally, part of the optimized scanning is that, for double quoted >> strings, when the first variable is encountered (making it non-constant), >> the amount that's been scanned up to that point is remembered, which can >> then be skipped over (up to the variable) after returning the quote token. >> Previously that initial part of the string was rescanned -- the cost >> dependent on how far "into" the string the first var is. >> >> >> I think that's about all -- I'll send another message if I forgot to >> mention anything... Just wanted to send this along quick for to you guys to >> look at or whatever. It was basically done last week, I just had to do a >> couple finishing touches and verify that everything was OK. >> >> http://realplain.com/php/scanner_diet.diff (Merged changes, but didn't >> test yet.) >> http://realplain.com/php/scanner_diet_5_3.diff >> >> >> Thanks, >> Matt > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > >
Hmm. RegexBuddy explains that as ... Match any character that is NOT a “]” But it sure doesn't look valid without a closing ]. -- ----- Richard Quadling Zend Certified Engineer : http://zend.com/zce.php?c=ZEND002498&r=213474731 "Standing on the shoulders of some very clever giants!" -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php