On Sat, May 03, 2014 at 03:17:49PM +0200, Denys Vlasenko wrote: > On Saturday 03 May 2014 05:10, Rich Felker wrote: > > On Wed, Apr 30, 2014 at 10:31:00AM +0200, Natanael Copa wrote: > > > Hi, > > > > > > I came across a bug (or feature) in busybox sed when trying to build > > > firefox-29. > > > > > > Testcase based on what firefox's configure scripts does: > > > > > > ASCII='AA' > > > NONASCII=$'\246\246' > > > > > > echo -e "($ASCII)\n($NONASCII)" | busybox sed 's/$/,/' > > > > The above script is invalid; \246\246 is an illegal sequence and thus > > is rejected by regexec. It will work only on non-UTF-8 systems/locales > > (which musl does not support). > > Lets refuse to find end of line if there is a non UTF-8 sequence inside that > line? > Sounds wrong to me...
sed (also regcomp and regexec) requires text input. Byte streams with illegal sequences are not text. Actually since the regex is not trying to match the illegal sequence, just the end-of-line, it would theoretically be possible to make this work (and it will once we overhaul the regex implementation to work with byte-based DFA's rather than character-based ones), but that doesn't change the fact that it's an invalid test. Rich _______________________________________________ busybox mailing list [email protected] http://lists.busybox.net/mailman/listinfo/busybox
