Re: [Toybox] [patch] add grep

2013-07-20 Thread Strake
Felix felix.ja...@posteo.de wrote:
 The *error_[msg,exit] functions should be used instead of [err,warn]x from 
 err.h.  These functions use xexit() instead of exit(). In xexit() calls 
 exit() unless
 toys.rebound nonzero (e.g. when invoked from toysh) and in this case it
 long_jmps to toys.rebound.

 The global variable c seems to have a similar purpose as toys.exitval. Is it 
 necessary?

I doubt it. See below.

Isaac idun...@lavabit.com wrote:
 I added some more flags: EFhinovclqe*f*m#
 what the kernel needs: EFabhiovwqd*e*m#
 (just for reference)
 -a is treat anything as text

Would be no-op, so will ignore.

 -b is print binary offset
 -d (read|skip|recurse) is a state switch that looks like insanity.
 -w matches a whole word.

 Too, I moved it to pending, and disabled it by default.

 I may hack it further later, but I already feel dirty for adding so
 many features.

 Well, I compiled toybox with musl and gcc complains about using regoff_t
 as an int, so I took a look.
 I didn't dare to poke at that, but noticed the exit (c).
 This would break toysh and anything else that may xexec() grep, so I
 started
 writing a patch to use xexit/error_exit/toys.exitval.

 By now I've also added -Hs because they were trivial.
 Mind if I submit it?

No; please do.

 In my testing it seemed that -e and -f don't work;
 it looks like they're not hooked up?

Just noticed a double free in -f code, derp. Unbroke that. Works for
me now at least. Will send when I merge yours.

 I noticed one oddity in your style: is there any reason to use
 function (args); //normal is function(args) ?

Yes: to my eyes, it looks crowded with no space, and is easier to read
with a space.

 The latter style makes a regex search for functions simpler.

I just use the function name.
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


[Toybox] [PATCH] grep: cleanup, add -Hs

2013-07-20 Thread Isaac
toybox uses xexit and error_exit instead of the standard C approach; this
allows toys to be called without starting a new process.
toys.exitval is the return value on exit.
loopfiles will read from stdin if nothing is specified.
POSIX specifies that grep shall by default print the line when only one 
input is specified. This can be handled by pretending we got -h 
if there are not multiple files.
-s is equivalent to setting failok in loopfiles_rw.

--
HTH,
Isaac Dunham
diff -r 1cf9c28012a7 toys/pending/grep.c
--- a/toys/pending/grep.c	Wed Jul 17 17:27:14 2013 -0500
+++ b/toys/pending/grep.c	Sat Jul 20 08:14:17 2013 -0700
@@ -5,13 +5,13 @@
  * See http://pubs.opengroup.org/onlinepubs/9699919799/utilities/
  * See http://refspecs.linuxfoundation.org/LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/cmdbehav.html
 
-USE_GREP(NEWTOY(grep, EFhinovclqe*f*m#, TOYFLAG_BIN))
+USE_GREP(NEWTOY(grep, EFhHinosvclqe*f*m#, TOYFLAG_BIN))
 
 config GREP
   bool grep
   default n
   help
-usage: grep [-clq] [-EFhinov] (-e RE | -f REfile | RE) [file...]
+usage: grep [-clq] [-EFhinosv] (-e RE | -f REfile | RE) [file...]
 
 modes:
   default: print lines from each file what match regular expression RE.
@@ -22,10 +22,12 @@
 flags:
   -E:   extended RE syntax
   -F:   fixed RE syntax, i.e. all characters literal
-  -h:   not print file name
+  -h:   do not print file name
+  -H:   print file name
   -i:   case insensitive
   -n:   print line numbers
   -o:   print only matching part
+  -s:   do not print file read errors
   -v:   invert match
 */
 
@@ -34,9 +36,6 @@
 #include regex.h
 #include err.h
 
-/* could be in GLOBALS but so need initialization code */
-static int c = 1;
-
 static regex_t re; /* fails in GLOBALS */
 
 GLOBALS(
@@ -60,10 +59,11 @@
 
 while (regexec (re, y, 1, match, atBOL ? 0 : REG_NOTBOL) == 0) {
   if (atBOL) nMatch++;
-  c = 0; atBOL = 0;
+  toys.exitval = 0;
+  atBOL = 0;
   switch (TT.mode) {
   case 'q':
-exit (0);
+xexit ();
   case 'l':
 if (!(toys.optflags  FLAG_h)) printf (%s\n, name);
 free (x);
@@ -136,7 +136,7 @@
   }
 
   if (!re_xs) {
-if (toys.optc  1) errx (2, no RE);
+if (toys.optc  1) error_exit (no RE);
 re_xs = toys.optflags  FLAG_F ? regfix (toys.optargs[0]) : toys.optargs[0];
 toys.optc--; toys.optargs++;
   }
@@ -144,7 +144,7 @@
   if (regcomp (re, re_xs,
(toys.optflags  (FLAG_E | FLAG_F) ? REG_EXTENDED : 0) |
(toys.optflags   FLAG_i   ? REG_ICASE: 0)) != 0) {
-errx (2, bad RE);
+error_exit(bad RE);
   }
 }
 
@@ -154,9 +154,9 @@
   if (toys.optflags  FLAG_c) TT.mode = 'c';
   if (toys.optflags  FLAG_l) TT.mode = 'l';
   if (toys.optflags  FLAG_q) TT.mode = 'q';
+  if (!(toys.optflags  FLAG_H)  (toys.optc  2))
+toys.optflags = toys.optflags | FLAG_h;
 
-  if (toys.optc  0) loopfiles (toys.optargs, do_grep);
-  else do_grep (0, -);
-
-  exit (c);
+  loopfiles_rw (toys.optargs, O_RDONLY, 0, (toys.optflags  FLAG_s), do_grep);
+  xexit ();
 }
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


[Toybox] [CLEANUP] ifconfig commit 957

2013-07-20 Thread Rob Landley

http://landley.net/hg/toybox/rev/957

It's been a while. I got a bit blocked on get_sockaddr() cleanup,  
because although it's only ever called twice (once for ipv6 and once  
for ipv4), I don't understand what the addresses it's parsing should  
look like. For example, does local: apply to just ipv6 of both ipv6  
and ipv4? The [ipv6]:port thing... can I put an ipv4 address in that  
first bit? Would anyone ever do that? Does just :port by itself apply  
to ipv4?


I dunno what addresses people feed to this, or why ifconfig eth0 add  
addr is ipv6 but just ifconfig eth0 addr couldnt _also_ be ipv6?  
(Why is ipv6 special requiring the add keyword?) The man page for  
ifconfig isn't hugely revealing, so I don't know how to untangle this  
function yet.


But I need to finish this cleanup at some point, so here's another stab  
at it:


The socklen field of sockaddr_with_len is only ever assigned to,  
never read. So it can go. This pretty STRONGLY implies that the rest of  
sockaddr_with_len can just be feed the right sockaddr_in or  
sockaddr_in6 to the function and we don't need the wrapper type, but  
see don't understand the addresses being parsed here. Need to find or  
create some test code with use cases...


I spent a while cleaning up address_to_name before realizing... it's  
never used. The whole function can just be deleted.


add_iface_to_list():

There's no real reason for a linked list of interfaces here except to  
eliminate duplicates between the two interface scanning methods. We  
don't need to defer interface processing, we just need a simple string  
list of interface names to skip ones we've already seen.


The only two callers to add_iface_to_list are show_iface() and  
readconf(), and the only caller of readconf is the else case of  
show_iface(). So start by inlining readconf() in show_iface().


A common source of code bloat is bureaucracy, I.E. filling out forms  
and then reading the data back out of the forms. It's simpler to just  
use the data at the point it's generated rather than creating  
unnecessary structures to store and transport it from producer to  
consumer.


In this case it applies to both creating an unnecessary list, _and_ to  
get_device_info being separate from display_ifconfig(). The only  
consumer of get_device_info() output is display_ifconfig(), and it's a  
1:1 ratio, so they might as well be the same function without an  
unnecessary structure passing data between them. (This may involve  
moving some of the show this or not decisions into the combined  
function.)


A symptom of this unnecessary layering is show_iface() and  
display_ifconfig(). Going by the names, could you guess what the  
difference between those functions is? I couldn't. Show and display  
mean approximately the same thing.


Execution starts in show_iface(), which receives an interface name as  
its argument. Convert struct if_list il to just a local structure on  
the stack, where we fill out one instance and immediately display or  
discard it. Add a struct string_list *ifaces to hold duplicate  
interface names. So the list is nothing _but_ names, and the data  
structure only has one instance.


Make the first loop traversing /proc/net/dev actually display the  
interface in the loop. When we're looking for a specific interface  
displaying it means we return then, otherwise add its name to the  
duplicate checking list, display it if it's up or we have -a, and  
continue on.


That means the if (iface_name) test after the loop means we didn't find  
it in /proc/net/dev, so just have a get/display pair on the supplied  
name there. The else case used to be readconf, but we're swapping in  
the string list instead of the old one for the duplciate skipping, and  
then an adjacent get_device_info()/display_ifconfig() operating on the  
one local copy of interface_list.


So we don't need a display everything loop at the end: by the time we  
get there we've displayed everything we're going to. We also don't need  
to free iface_list, instead we free the new string_list.


At this point we've paired up all the  
get_device_info()/display_ifconfig() calls and have them each using a  
structure with a lifetime of just those two lines. This means on the  
next pass we can combine those functions and disintegrate the structure.


I changed the calling convention of get_device_info and  
display_ifconfig to take the name of the interface as a separate  
argument, rather than part of the structure, because the name is the  
main thing it should need when the structure goes away. (It'll also be  
able to take the val[] array, but that can be NULL when it's not  
available.) I had to fix a conflict in display_ifconfig that was  
already using char *name for ipv6 scope display: replaced it with  
scope, and then there was already a scope int value so I replaced  
_that_ with iscope.


Ifconfig is now down under 600 lines. (Closing in on my original  
estimate of this much 

[Toybox] [CLEANUP] paste

2013-07-20 Thread Rob Landley
Description of the paste cleanup that accidentally got checked in  
during commit 944 because hg import -f has side effects.


  http://landley.net/hg/toybox/rev/944

(Yes, that commit description is actively ironic. Scroll down to  
patch.c.)


Minor tweaks to the first loop: if -d isn't set then TT.delim will be  
NULL, and as long as that's the value we're playing with anyway the  
optimizer tends to do better if it can load that into a register and  
perform several operations on it, rather than load/mask/test an  
otherwise unrelated integer. So yank the test and instead start the  
loop with p = TT.delim ? TT.delim : \t; (In theory gcc offers an x ?  
: y; syntax where the blank between ? : substitutes in the result of  
the test again, so I _could_ say p = TT.delim ? : \t;. In practice,  
c99 never picked that one up so I feel uncomfortable using it.)


I was on the fence about moving the assignment from the else case into  
the if, but did it anyway, so:


-if (*p == '\\') {
-} else *buf = *p;

becomes:

+  if ((*buf = *p) == '\\') {
+  }

This means we can load *p into a register once and then perform both  
the assignment and the test from that register. The initial write only  
goes into L1 cache and if it gets overwritten a couple lines later we  
only have one write to main memory. In general modern processors have a  
better time eliding writes than predicting branches. A lot of stuff  
like:


  if (x) y = 1;
  else y = 2;

Is going to get rewritten by the optimizer into:

  y = 2;
  if (x) y = 1;

So the if (x) can become a conditional assignment instruction instead  
of a branch instruction. (This avoids bubbles in the instruction  
pipeline due to branch mispredicts, and/or wasted work from speculative  
execution on mispredicted branches eating your battery power and then  
being discarded.)


I was on the fence because this is phrasing easier for the computer to  
do, and less repetitive (only one mention of *p instead of 2), but  
slightly less easy to read.


Turned the p++ on its own line into a ++p in the stridx.

Changed the error message to say bad -d instead of bad delimiter so  
it's one less word to translate to other languages.


Moved the // Sequential comment from on the same line to the line  
before. I know it uses more vertical screen space and I ordinarily try  
to conserve that, but mixing comment and code on the same line is  
something I only really do describing data values. (Describing  
structure members, table entries, or magic constants when a symbol name  
is not conveniently available in a header.) This comment is more what  
does this block of code do, and that I prefer to put on its own line.


This loop is what loopfiles() is for. That handles the aliasing of -  
with stdin and everything.


(No, I'm probably not done cleaning up paste, this was just my initial  
pass that got accidentally checked in.)


Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] grep: cleanup, add -Hs

2013-07-20 Thread Isaac
On Sat, Jul 20, 2013 at 11:42:48AM -0500, Strake wrote:
 # HG changeset patch
 # User Strake
 # Date 1374338194 18000
 # Node ID 24cd187521c1c89f16cddb9b346c1da7e900ecdd
 # Parent  1cf9c28012a76dd30cd1a7fcba8251b189d7df5a
 grep
 
 partly due to Isaac Dunham

See inline comments below.
 
 diff -r 1cf9c28012a7 -r 24cd187521c1 toys/pending/grep.c
 --- a/toys/pending/grep.c Wed Jul 17 17:27:14 2013 -0500
 +++ b/toys/pending/grep.c Sat Jul 20 11:36:34 2013 -0500
 @@ -5,13 +5,13 @@
   * See http://pubs.opengroup.org/onlinepubs/9699919799/utilities/
   * See 
 http://refspecs.linuxfoundation.org/LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/cmdbehav.html
 
 -USE_GREP(NEWTOY(grep, EFhinovclqe*f*m#, TOYFLAG_BIN))
 +USE_GREP(NEWTOY(grep, EFHahinosvclqe*f*m#, TOYFLAG_BIN))
 
  config GREP
bool grep
default n
help
 -usage: grep [-clq] [-EFhinov] (-e RE | -f REfile | RE) [file...]
 +usage: grep [-clq] [-EFHhinosv] (-e RE | -f REfile | RE) [file...]
 
  modes:
default: print lines from each file what match regular expression RE.
 @@ -22,20 +22,18 @@
  flags:
-E:   extended RE syntax
-F:   fixed RE syntax, i.e. all characters literal
 +  -H:   print file name
-h:   not print file name
-i:   case insensitive
-n:   print line numbers
-o:   print only matching part
 +  -s:   keep silent on error
-v:   invert match
  */
 
  #define FOR_grep
  #include toys.h
  #include regex.h
 -#include err.h
 -
 -/* could be in GLOBALS but so need initialization code */
 -static int c = 1;
 
  static regex_t re; /* fails in GLOBALS */
 
 @@ -60,10 +58,12 @@
 
  while (regexec (re, y, 1, match, atBOL ? 0 : REG_NOTBOL) == 0) {
if (atBOL) nMatch++;
 -  c = 0; atBOL = 0;
 +  toys.exitval = 0;
 +  atBOL = 0;
switch (TT.mode) {
case 'q':
 -exit (0);
 +toys.exitval = 0;

I don't think switch(TT.mode) is changing toys.exitval.

 +xexit ();
case 'l':
  if (!(toys.optflags  FLAG_h)) printf (%s\n, name);
  free (x);
 @@ -121,7 +121,8 @@
  for (;;) {
if (getline (x, l, f)  0) {
  if (feof (f)) break;
 -err (2, failed to read);
 +toys.exitval = 2;
 +perror_exit (failed to read);

I'd think this should be warn and see if there are more REs...
But I could be wrong.

And this reminds me:
Any particular reason that buildRE() needs to be a one-call monster that 
does all the regexes in a single pass?
I would have thought that the code used to parse TT.eArgu could be shared 
with parsing each line of TT.fArgu.

Also, Rob: does error_exit really need toys.exitval to be set?

}
y = x + strlen (x) - 1;
if (y[0] == '\n') y[0] = 0;
 @@ -129,14 +130,17 @@
y = toys.optflags  FLAG_F ? regfix (x) : x;
if (re_xs) re_xs = xastrcat (re_xs, |);
re_xs = xastrcat (re_xs, y);
 -  free (y);
 +  if (toys.optflags  FLAG_F) free (y);
  }
  free (x);
  fclose (f);
}
 
if (!re_xs) {
 -if (toys.optc  1) errx (2, no RE);
 +if (toys.optc  1) {
 +  toys.exitval = 2;
 +  error_exit (no RE);
 +}
  re_xs = toys.optflags  FLAG_F ? regfix (toys.optargs[0]) :
 toys.optargs[0];
  toys.optc--; toys.optargs++;
}
 @@ -144,7 +148,8 @@
if (regcomp (re, re_xs,
 (toys.optflags  (FLAG_E | FLAG_F) ? REG_EXTENDED : 0) |
 (toys.optflags   FLAG_i   ? REG_ICASE: 0)) != 0) 
 {

You can remove the != 0.

 -errx (2, bad RE);
 +toys.exitval = 2;
 +error_exit (bad RE);
}
  }
 
 @@ -155,8 +160,9 @@
if (toys.optflags  FLAG_l) TT.mode = 'l';
if (toys.optflags  FLAG_q) TT.mode = 'q';
 
 -  if (toys.optc  0) loopfiles (toys.optargs, do_grep);
 -  else do_grep (0, -);
 +  if (!(toys.optflags  FLAG_H)  (toys.optc  2)) toys.optflags |= FLAG_h;
 
 -  exit (c);
 +  toys.exitval = 1;
 +  loopfiles_rw (toys.optargs, O_RDONLY, 0, toys.optflags  FLAG_s, do_grep);
 +  xexit ();
  }

___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [CLEANUP] paste

2013-07-20 Thread Felix Janda
Rob Landley wrote:
 Description of the paste cleanup that accidentally got checked in  
 during commit 944 because hg import -f has side effects.
 
http://landley.net/hg/toybox/rev/944
 
 (Yes, that commit description is actively ironic. Scroll down to  
 patch.c.)
[...]

The accidental cleanup checked in does not seem to reflect what you
describe in the mail. It looks like a zeroth round of cleanup fixing
only some whitespace.

Thanks,
Felix
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net