On 1/17/24 18:57, Rob Landley wrote: > On 1/17/24 15:16, Peter Collingbourne via Toybox wrote: > Thanks. Digging into it...
Looks like I screwed up --color when adding the bucket sort logic. (I haven't got regression tests for --color, because unless I'm exactly matching the ascii escape sequences I'd just be filtering them out in the test...) The --color logic is a repurposed variant of the -o logic, and what's happening here is the regex-alikes are matching "a", then vetoing it because of the extra -w stipulations, then doing the right thing for -o (not outputting it) instead of the right thing for --color (extending the not-match output). The logic found the best match and then did a FLAG(w) check afterwards to see if it's a WORD match, and the problem is it skipped it if so, which advanced past it, which is why it didn't get printed. Used to work with the old logic, but inappropriate now with the new bucket sort stuff. The fix is to move the FLAG(w) check into the two loops (fixed and regex matches) so it vetoes each match immediately, and we never advance to the "skip forward past this" logic for something that isn't a match. I made a function that both can call so the code isn't TOO badly duplicated. The hangup is that my first stab at fixing this broke the two -w '' tests, and then I got confused staring at: // Empty pattern always matches if (rc && *TT.fixed && !FLAG(o)) rc = 0; Because it LOOKS backwards: if rc is set and the first character of fixed is set switch the match _off_? But rc is using syscall logic return code logic here, where 0 means success and 1 means failure (because that's what regexec() returns so it's all doing that). And then I had to reread parse_regex() to remind myself that TT.fixed is an array of arrays (bucket sort, indexed by first letter), and that fixed[0] means the first (non-special) character is zero, so the above check is asking "do we have any zero length fixed match strings". (I left myself a comment but it wasn't a BIG ENOUGH comment). And THEN it turns out I never quite understood the -w logic in the first place, and it took me a while to come up with even more tests: $ echo | grep 'a' $ echo | grep '' $ echo 'one' | grep -w '' $ echo 'one two' | grep -w '' $ echo 'one two' | grep -w '' one two $ echo 'one ' | grep -w '' one $ echo ' one' | grep -w '' one $ echo potato | grep '^' potato $ echo potato | grep '$' potato $ echo 'one two' | grep -w '^' $ echo 'one two' | grep -w '$' So without -w the strings ^ and $ match any line (because every line has a beginning and an end), but ^$ only matches an empty line. With -w the empty string only matches either an empty line or a line with two spaces in it, and ^ matches an empty string OR a line with a space as the first character, and $ matches the empty string OR a line with a space as the last character. Which means that empty string checking snippet ALSO needs to move inside the loop with the rest of the -w logic. At first I thought it doesn't move inside the function because empty strings are always fixed matches (if there are any unescaped special characters to trigger regex plumbing, it's not a fixed match). But then I thought of: $ echo 'one two' | grep -w 'a*' $ echo 'one two' | grep -w 'a*' one two Because zero or more occurrences of "a" is a zero length match and of course I have to police it for -w with ^ and $. So if you're wondering why that's taking so long to fix... it's because my test suite didn't have enough tests and I had work out how to add more. Rob _______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net