Re: [Toybox] [patch] add grep
Felix felix.ja...@posteo.de wrote: The *error_[msg,exit] functions should be used instead of [err,warn]x from err.h. These functions use xexit() instead of exit(). In xexit() calls exit() unless toys.rebound nonzero (e.g. when invoked from toysh) and in this case it long_jmps to toys.rebound. The global variable c seems to have a similar purpose as toys.exitval. Is it necessary? I doubt it. See below. Isaac idun...@lavabit.com wrote: I added some more flags: EFhinovclqe*f*m# what the kernel needs: EFabhiovwqd*e*m# (just for reference) -a is treat anything as text Would be no-op, so will ignore. -b is print binary offset -d (read|skip|recurse) is a state switch that looks like insanity. -w matches a whole word. Too, I moved it to pending, and disabled it by default. I may hack it further later, but I already feel dirty for adding so many features. Well, I compiled toybox with musl and gcc complains about using regoff_t as an int, so I took a look. I didn't dare to poke at that, but noticed the exit (c). This would break toysh and anything else that may xexec() grep, so I started writing a patch to use xexit/error_exit/toys.exitval. By now I've also added -Hs because they were trivial. Mind if I submit it? No; please do. In my testing it seemed that -e and -f don't work; it looks like they're not hooked up? Just noticed a double free in -f code, derp. Unbroke that. Works for me now at least. Will send when I merge yours. I noticed one oddity in your style: is there any reason to use function (args); //normal is function(args) ? Yes: to my eyes, it looks crowded with no space, and is easier to read with a space. The latter style makes a regex search for functions simpler. I just use the function name. ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
[Toybox] [PATCH] grep: cleanup, add -Hs
toybox uses xexit and error_exit instead of the standard C approach; this allows toys to be called without starting a new process. toys.exitval is the return value on exit. loopfiles will read from stdin if nothing is specified. POSIX specifies that grep shall by default print the line when only one input is specified. This can be handled by pretending we got -h if there are not multiple files. -s is equivalent to setting failok in loopfiles_rw. -- HTH, Isaac Dunham diff -r 1cf9c28012a7 toys/pending/grep.c --- a/toys/pending/grep.c Wed Jul 17 17:27:14 2013 -0500 +++ b/toys/pending/grep.c Sat Jul 20 08:14:17 2013 -0700 @@ -5,13 +5,13 @@ * See http://pubs.opengroup.org/onlinepubs/9699919799/utilities/ * See http://refspecs.linuxfoundation.org/LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/cmdbehav.html -USE_GREP(NEWTOY(grep, EFhinovclqe*f*m#, TOYFLAG_BIN)) +USE_GREP(NEWTOY(grep, EFhHinosvclqe*f*m#, TOYFLAG_BIN)) config GREP bool grep default n help -usage: grep [-clq] [-EFhinov] (-e RE | -f REfile | RE) [file...] +usage: grep [-clq] [-EFhinosv] (-e RE | -f REfile | RE) [file...] modes: default: print lines from each file what match regular expression RE. @@ -22,10 +22,12 @@ flags: -E: extended RE syntax -F: fixed RE syntax, i.e. all characters literal - -h: not print file name + -h: do not print file name + -H: print file name -i: case insensitive -n: print line numbers -o: print only matching part + -s: do not print file read errors -v: invert match */ @@ -34,9 +36,6 @@ #include regex.h #include err.h -/* could be in GLOBALS but so need initialization code */ -static int c = 1; - static regex_t re; /* fails in GLOBALS */ GLOBALS( @@ -60,10 +59,11 @@ while (regexec (re, y, 1, match, atBOL ? 0 : REG_NOTBOL) == 0) { if (atBOL) nMatch++; - c = 0; atBOL = 0; + toys.exitval = 0; + atBOL = 0; switch (TT.mode) { case 'q': -exit (0); +xexit (); case 'l': if (!(toys.optflags FLAG_h)) printf (%s\n, name); free (x); @@ -136,7 +136,7 @@ } if (!re_xs) { -if (toys.optc 1) errx (2, no RE); +if (toys.optc 1) error_exit (no RE); re_xs = toys.optflags FLAG_F ? regfix (toys.optargs[0]) : toys.optargs[0]; toys.optc--; toys.optargs++; } @@ -144,7 +144,7 @@ if (regcomp (re, re_xs, (toys.optflags (FLAG_E | FLAG_F) ? REG_EXTENDED : 0) | (toys.optflags FLAG_i ? REG_ICASE: 0)) != 0) { -errx (2, bad RE); +error_exit(bad RE); } } @@ -154,9 +154,9 @@ if (toys.optflags FLAG_c) TT.mode = 'c'; if (toys.optflags FLAG_l) TT.mode = 'l'; if (toys.optflags FLAG_q) TT.mode = 'q'; + if (!(toys.optflags FLAG_H) (toys.optc 2)) +toys.optflags = toys.optflags | FLAG_h; - if (toys.optc 0) loopfiles (toys.optargs, do_grep); - else do_grep (0, -); - - exit (c); + loopfiles_rw (toys.optargs, O_RDONLY, 0, (toys.optflags FLAG_s), do_grep); + xexit (); } ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
[Toybox] [CLEANUP] ifconfig commit 957
http://landley.net/hg/toybox/rev/957 It's been a while. I got a bit blocked on get_sockaddr() cleanup, because although it's only ever called twice (once for ipv6 and once for ipv4), I don't understand what the addresses it's parsing should look like. For example, does local: apply to just ipv6 of both ipv6 and ipv4? The [ipv6]:port thing... can I put an ipv4 address in that first bit? Would anyone ever do that? Does just :port by itself apply to ipv4? I dunno what addresses people feed to this, or why ifconfig eth0 add addr is ipv6 but just ifconfig eth0 addr couldnt _also_ be ipv6? (Why is ipv6 special requiring the add keyword?) The man page for ifconfig isn't hugely revealing, so I don't know how to untangle this function yet. But I need to finish this cleanup at some point, so here's another stab at it: The socklen field of sockaddr_with_len is only ever assigned to, never read. So it can go. This pretty STRONGLY implies that the rest of sockaddr_with_len can just be feed the right sockaddr_in or sockaddr_in6 to the function and we don't need the wrapper type, but see don't understand the addresses being parsed here. Need to find or create some test code with use cases... I spent a while cleaning up address_to_name before realizing... it's never used. The whole function can just be deleted. add_iface_to_list(): There's no real reason for a linked list of interfaces here except to eliminate duplicates between the two interface scanning methods. We don't need to defer interface processing, we just need a simple string list of interface names to skip ones we've already seen. The only two callers to add_iface_to_list are show_iface() and readconf(), and the only caller of readconf is the else case of show_iface(). So start by inlining readconf() in show_iface(). A common source of code bloat is bureaucracy, I.E. filling out forms and then reading the data back out of the forms. It's simpler to just use the data at the point it's generated rather than creating unnecessary structures to store and transport it from producer to consumer. In this case it applies to both creating an unnecessary list, _and_ to get_device_info being separate from display_ifconfig(). The only consumer of get_device_info() output is display_ifconfig(), and it's a 1:1 ratio, so they might as well be the same function without an unnecessary structure passing data between them. (This may involve moving some of the show this or not decisions into the combined function.) A symptom of this unnecessary layering is show_iface() and display_ifconfig(). Going by the names, could you guess what the difference between those functions is? I couldn't. Show and display mean approximately the same thing. Execution starts in show_iface(), which receives an interface name as its argument. Convert struct if_list il to just a local structure on the stack, where we fill out one instance and immediately display or discard it. Add a struct string_list *ifaces to hold duplicate interface names. So the list is nothing _but_ names, and the data structure only has one instance. Make the first loop traversing /proc/net/dev actually display the interface in the loop. When we're looking for a specific interface displaying it means we return then, otherwise add its name to the duplicate checking list, display it if it's up or we have -a, and continue on. That means the if (iface_name) test after the loop means we didn't find it in /proc/net/dev, so just have a get/display pair on the supplied name there. The else case used to be readconf, but we're swapping in the string list instead of the old one for the duplciate skipping, and then an adjacent get_device_info()/display_ifconfig() operating on the one local copy of interface_list. So we don't need a display everything loop at the end: by the time we get there we've displayed everything we're going to. We also don't need to free iface_list, instead we free the new string_list. At this point we've paired up all the get_device_info()/display_ifconfig() calls and have them each using a structure with a lifetime of just those two lines. This means on the next pass we can combine those functions and disintegrate the structure. I changed the calling convention of get_device_info and display_ifconfig to take the name of the interface as a separate argument, rather than part of the structure, because the name is the main thing it should need when the structure goes away. (It'll also be able to take the val[] array, but that can be NULL when it's not available.) I had to fix a conflict in display_ifconfig that was already using char *name for ipv6 scope display: replaced it with scope, and then there was already a scope int value so I replaced _that_ with iscope. Ifconfig is now down under 600 lines. (Closing in on my original estimate of this much
[Toybox] [CLEANUP] paste
Description of the paste cleanup that accidentally got checked in during commit 944 because hg import -f has side effects. http://landley.net/hg/toybox/rev/944 (Yes, that commit description is actively ironic. Scroll down to patch.c.) Minor tweaks to the first loop: if -d isn't set then TT.delim will be NULL, and as long as that's the value we're playing with anyway the optimizer tends to do better if it can load that into a register and perform several operations on it, rather than load/mask/test an otherwise unrelated integer. So yank the test and instead start the loop with p = TT.delim ? TT.delim : \t; (In theory gcc offers an x ? : y; syntax where the blank between ? : substitutes in the result of the test again, so I _could_ say p = TT.delim ? : \t;. In practice, c99 never picked that one up so I feel uncomfortable using it.) I was on the fence about moving the assignment from the else case into the if, but did it anyway, so: -if (*p == '\\') { -} else *buf = *p; becomes: + if ((*buf = *p) == '\\') { + } This means we can load *p into a register once and then perform both the assignment and the test from that register. The initial write only goes into L1 cache and if it gets overwritten a couple lines later we only have one write to main memory. In general modern processors have a better time eliding writes than predicting branches. A lot of stuff like: if (x) y = 1; else y = 2; Is going to get rewritten by the optimizer into: y = 2; if (x) y = 1; So the if (x) can become a conditional assignment instruction instead of a branch instruction. (This avoids bubbles in the instruction pipeline due to branch mispredicts, and/or wasted work from speculative execution on mispredicted branches eating your battery power and then being discarded.) I was on the fence because this is phrasing easier for the computer to do, and less repetitive (only one mention of *p instead of 2), but slightly less easy to read. Turned the p++ on its own line into a ++p in the stridx. Changed the error message to say bad -d instead of bad delimiter so it's one less word to translate to other languages. Moved the // Sequential comment from on the same line to the line before. I know it uses more vertical screen space and I ordinarily try to conserve that, but mixing comment and code on the same line is something I only really do describing data values. (Describing structure members, table entries, or magic constants when a symbol name is not conveniently available in a header.) This comment is more what does this block of code do, and that I prefer to put on its own line. This loop is what loopfiles() is for. That handles the aliasing of - with stdin and everything. (No, I'm probably not done cleaning up paste, this was just my initial pass that got accidentally checked in.) Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [PATCH] grep: cleanup, add -Hs
On Sat, Jul 20, 2013 at 11:42:48AM -0500, Strake wrote: # HG changeset patch # User Strake # Date 1374338194 18000 # Node ID 24cd187521c1c89f16cddb9b346c1da7e900ecdd # Parent 1cf9c28012a76dd30cd1a7fcba8251b189d7df5a grep partly due to Isaac Dunham See inline comments below. diff -r 1cf9c28012a7 -r 24cd187521c1 toys/pending/grep.c --- a/toys/pending/grep.c Wed Jul 17 17:27:14 2013 -0500 +++ b/toys/pending/grep.c Sat Jul 20 11:36:34 2013 -0500 @@ -5,13 +5,13 @@ * See http://pubs.opengroup.org/onlinepubs/9699919799/utilities/ * See http://refspecs.linuxfoundation.org/LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/cmdbehav.html -USE_GREP(NEWTOY(grep, EFhinovclqe*f*m#, TOYFLAG_BIN)) +USE_GREP(NEWTOY(grep, EFHahinosvclqe*f*m#, TOYFLAG_BIN)) config GREP bool grep default n help -usage: grep [-clq] [-EFhinov] (-e RE | -f REfile | RE) [file...] +usage: grep [-clq] [-EFHhinosv] (-e RE | -f REfile | RE) [file...] modes: default: print lines from each file what match regular expression RE. @@ -22,20 +22,18 @@ flags: -E: extended RE syntax -F: fixed RE syntax, i.e. all characters literal + -H: print file name -h: not print file name -i: case insensitive -n: print line numbers -o: print only matching part + -s: keep silent on error -v: invert match */ #define FOR_grep #include toys.h #include regex.h -#include err.h - -/* could be in GLOBALS but so need initialization code */ -static int c = 1; static regex_t re; /* fails in GLOBALS */ @@ -60,10 +58,12 @@ while (regexec (re, y, 1, match, atBOL ? 0 : REG_NOTBOL) == 0) { if (atBOL) nMatch++; - c = 0; atBOL = 0; + toys.exitval = 0; + atBOL = 0; switch (TT.mode) { case 'q': -exit (0); +toys.exitval = 0; I don't think switch(TT.mode) is changing toys.exitval. +xexit (); case 'l': if (!(toys.optflags FLAG_h)) printf (%s\n, name); free (x); @@ -121,7 +121,8 @@ for (;;) { if (getline (x, l, f) 0) { if (feof (f)) break; -err (2, failed to read); +toys.exitval = 2; +perror_exit (failed to read); I'd think this should be warn and see if there are more REs... But I could be wrong. And this reminds me: Any particular reason that buildRE() needs to be a one-call monster that does all the regexes in a single pass? I would have thought that the code used to parse TT.eArgu could be shared with parsing each line of TT.fArgu. Also, Rob: does error_exit really need toys.exitval to be set? } y = x + strlen (x) - 1; if (y[0] == '\n') y[0] = 0; @@ -129,14 +130,17 @@ y = toys.optflags FLAG_F ? regfix (x) : x; if (re_xs) re_xs = xastrcat (re_xs, |); re_xs = xastrcat (re_xs, y); - free (y); + if (toys.optflags FLAG_F) free (y); } free (x); fclose (f); } if (!re_xs) { -if (toys.optc 1) errx (2, no RE); +if (toys.optc 1) { + toys.exitval = 2; + error_exit (no RE); +} re_xs = toys.optflags FLAG_F ? regfix (toys.optargs[0]) : toys.optargs[0]; toys.optc--; toys.optargs++; } @@ -144,7 +148,8 @@ if (regcomp (re, re_xs, (toys.optflags (FLAG_E | FLAG_F) ? REG_EXTENDED : 0) | (toys.optflags FLAG_i ? REG_ICASE: 0)) != 0) { You can remove the != 0. -errx (2, bad RE); +toys.exitval = 2; +error_exit (bad RE); } } @@ -155,8 +160,9 @@ if (toys.optflags FLAG_l) TT.mode = 'l'; if (toys.optflags FLAG_q) TT.mode = 'q'; - if (toys.optc 0) loopfiles (toys.optargs, do_grep); - else do_grep (0, -); + if (!(toys.optflags FLAG_H) (toys.optc 2)) toys.optflags |= FLAG_h; - exit (c); + toys.exitval = 1; + loopfiles_rw (toys.optargs, O_RDONLY, 0, toys.optflags FLAG_s, do_grep); + xexit (); } ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] [CLEANUP] paste
Rob Landley wrote: Description of the paste cleanup that accidentally got checked in during commit 944 because hg import -f has side effects. http://landley.net/hg/toybox/rev/944 (Yes, that commit description is actively ironic. Scroll down to patch.c.) [...] The accidental cleanup checked in does not seem to reflect what you describe in the mail. It looks like a zeroth round of cleanup fixing only some whitespace. Thanks, Felix ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net