[^] is a special case to write a portable match any character in re2c. Scott
Dmitry Stogov wrote: > Hi Matt, > > Does this patch fix EOF handling issues related to mmap()? (e.g. parsing > of files with size 4096, 8192, ...). Now we have two dirty fixes to > handle them correctly. > > The patch is quite big to understand it quickly. I'll probably take a > look on weekend. > > -ANY_CHAR [^\x00] > +ANY_CHAR [^] > > Is [^] a correct regular expression? > > Thanks. Dmitry. > > Matt Wilmas wrote: >> Hi Dmitry, Brian, all, >> >> Here's a scanner patch that I mentioned awhile ago, with a possible >> way to work around the re2c EOF handling issues. >> >> The primary change is to do a "manual scan" like I talked about in >> areas that match large amounts and can contain NULL bytes >> (strings/comments, which are now scanned faster too), as is done for >> inline HTML. I called it a "diet" :-) because it removes my >> complicated string regex patterns from a couple years ago, which >> doesn't make the .l file much smaller after adding the manual scan >> code (easier to understand...?), but it does result in a ~34k >> reduction of 5.3's generated .c file... >> >> This fixes Bug #46817, as well as a better, more proper fix for the >> older Bug #42767, both related to ending comments. >> >> Now inline HTML chunks aren't broken up when a tag starting with "s" >> is encountered (<script> for JS, <span>, etc.), since it's unlikely to >> be a long PHP <script> tag. >> >> If an opening PHP <SCRIPT> tag was used with a capital "S", it was >> missed if it wasn't the first thing scanned: >> >> var_dump(token_get_all("HTML... <SCRIPT language=php>")); >> >> Single-line comments with a Windows newline didn't include the full \r\n: >> >> var_dump(token_get_all("<?php // Comment\r\n?>")); >> >> Finally, part of the optimized scanning is that, for double quoted >> strings, when the first variable is encountered (making it >> non-constant), the amount that's been scanned up to that point is >> remembered, which can then be skipped over (up to the variable) after >> returning the quote token. Previously that initial part of the string >> was rescanned -- the cost dependent on how far "into" the string the >> first var is. >> >> >> I think that's about all -- I'll send another message if I forgot to >> mention anything... Just wanted to send this along quick for to you >> guys to look at or whatever. It was basically done last week, I just >> had to do a couple finishing touches and verify that everything was OK. >> >> http://realplain.com/php/scanner_diet.diff (Merged changes, but didn't >> test yet.) >> http://realplain.com/php/scanner_diet_5_3.diff >> >> >> Thanks, >> Matt > -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php