Re: [Toybox] strings tests and bugfixes
On 06/14/2017 08:04 AM, Ilya Kuzmich wrote: > ping? Sorry again for the delay! (I have so many open tabs. Trying to close a few.) >>> I glanced at this one to see if it was trivial to apply but: >>> >>> -if (nread < 1) break; >>> +if (nread < 1) { >>> + if (count == wlen) xputc('\n'); >>> + break; >>> +} >>> >>> We have \n at the end of strings already? Under what circumstances would >>> it _not_ output them? >>> >>> make strings >>> $ diff -u <(./strings strings) <(strings strings) | wc >>> 0 0 0 >> current implementation doesn't output newline if last byte does matches: >> if (((toybuf[i] >= 32) && (toybuf[i] <= 126)) || (toybuf[i] == '\t')) { Which violates posix: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/strings.html A printable string is any sequence of four (by default) or more printable characters terminated by a or NUL character. And which never happens in a file conforming to the ELF spec, which uses NUL. (The purpose of this command is to show strings "in a binary file", which originally meant executable format du jour. The current one in ubuntu seems to default to "-a" which shows all printable text, but it's still got the plumbing to parse ELF sections if you ask it to because that was the original purpose. I vaguely recall their ELF traversal had something stupidly exploitable and they switched to -a because of it? Not online to look it up right now...) That said, _we_ violate posix here because I'm not reading ahead and buffering unlimited data before displaying the output (just the -n amount, gneerally 4 bytes), which means I'm not enforcing the nul/newline check because the really simple algorithm I'm doing just treats end of string as data that's not printable. Except it's not so much violation as "historical usage" according to the "Application Usage" section of posix: By default the data area (as opposed to the text, "bss", or header areas) of a binary executable file is scanned. Implementations document which areas are scanned. Some historical implementations do not require NUL or terminators for strings to permit those languages that do not use NUL as a string terminator to have their strings written. You have a point that if we start printing the string, we might as well finish printing it with the newline. But reading ahead and confirming we have a newline or NULL would cut down on the 4-byte false positives. (I'm tempted to do it _just_ for the first byte after what we've buffered, if that's out of spec but not NUL or \n don't print. But that's of of those "easy to implement but hard to document" things that make the command _conceptually_ hard to understand. Blah.) Once again, if somebody ELSE had come to me and pointed out the deviation I'd probably take it seriously, but if it's been in the code for 3 years and I'm the first person to notice... >>> As for the offset+1 bit... >>> >>> $ ./strings -o strings | head -n 3 >>> 567 /lib64/ld-linux-x86-64.so.2 >>> 646 g"X7 >>> 1600 libc.so.6 >>> $ strings -o strings | head -n 3 >>> 1070 /lib64/ld-linux-x86-64.so.2 >>> 1207 g"X7 >>> 3101 libc.so.6 Posix hasn't got -o. It has "-t d" instead. See the "Rationale" section for why they changed it. (Basically the octal vs decimal issue.) I have a todo item about implementing posix -t, but I was waiting for somebody to need it and you still haven't asked about that, but I might as well do it while I'm looking at it now... Sigh, and how much do I care about my other todo item (utf8 support)? Ubuntu 14.04 doesn't: $ strings -a tests/files/utf8/japan.txt $ That's probably one of the big reasons I'm the first person to notice problems with the command. It's verging on obsolete... >>> I don't see how +1 addresses that? (Or what exactly is going on there.) >> gnu strings use octal offsets by default And I used decimal because the strings man page says: -o Like -t o. Some other versions of strings have -o act like -t d instead. Since we can not be compatible with both ways, we simply chose one. Seemed like something I could choose, so I did. The vestigial octal in unix (file permissions and such) is left over from its initial implementation on a pdp-7, which was an 18 bit machine. (I.E. 3*6, each word was 3 octal bytes.) They switched to 8 bit bytes when they ported it to the pdp-11 (up and running in March 1971 according to https://www.bell-labs.com/usr/dmr/www/notes.html). Binutils doing new octal after that was just silly (gcc 1.0 came out in 1987, 16 years after octal output made sense). That said, busybox mindlessly copied binutils here. Does something actually use this? >> toybox strings use decimal, but one-off: Hmmm... hexdump -c agrees that the offset isn't where the data starts. That's a bug. >> $ echo fooo | ./toybox strings -o >> -1 fooo Alright, let's look at the patch again. Yay new test suite entries, the special case
Re: [Toybox] strings tests and bugfixes
ping? On 01/06, Ilya Kuzmich wrote: > On 01/06, Rob Landley wrote: > > On 05/28/2017 11:25 PM, Ilya Kuzmich wrote: > > > Found and fixed some strings bugs. duh. > > > > Sorry for the delay, I started a new Japanese class at the local > > community college (one of those compressed summer session things) and > > between that and $DAYJOB everything else is mostly just on weekends > > right now. :) > > > > I glanced at this one to see if it was trivial to apply but: > > > > -if (nread < 1) break; > > +if (nread < 1) { > > + if (count == wlen) xputc('\n'); > > + break; > > +} > > > > We have \n at the end of strings already? Under what circumstances would > > it _not_ output them? > > > > make strings > > $ diff -u <(./strings strings) <(strings strings) | wc > > 0 0 0 > current implementation doesn't output newline if last byte does matches: > if (((toybuf[i] >= 32) && (toybuf[i] <= 126)) || (toybuf[i] == '\t')) { > > for example: > > $ echo -n 'fooo' | strings > fooo > $ echo -n 'fooo' | ./toybox strings > fooo$ > # note the missing newline > > > > > As for the offset+1 bit... > > > > $ ./strings -o strings | head -n 3 > > 567 /lib64/ld-linux-x86-64.so.2 > > 646 g"X7 > > 1600 libc.so.6 > > $ strings -o strings | head -n 3 > > 1070 /lib64/ld-linux-x86-64.so.2 > > 1207 g"X7 > > 3101 libc.so.6 > > > > I don't see how +1 addresses that? (Or what exactly is going on there.) > gnu strings use octal offsets by default > toybox strings use decimal, but one-off: > > $ echo fooo | ./toybox strings -o > -1 fooo > > > > > Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] strings tests and bugfixes
On 01/06, Rob Landley wrote: > On 05/28/2017 11:25 PM, Ilya Kuzmich wrote: > > Found and fixed some strings bugs. duh. > > Sorry for the delay, I started a new Japanese class at the local > community college (one of those compressed summer session things) and > between that and $DAYJOB everything else is mostly just on weekends > right now. :) > > I glanced at this one to see if it was trivial to apply but: > > -if (nread < 1) break; > +if (nread < 1) { > + if (count == wlen) xputc('\n'); > + break; > +} > > We have \n at the end of strings already? Under what circumstances would > it _not_ output them? > > make strings > $ diff -u <(./strings strings) <(strings strings) | wc > 0 0 0 current implementation doesn't output newline if last byte does matches: if (((toybuf[i] >= 32) && (toybuf[i] <= 126)) || (toybuf[i] == '\t')) { for example: $ echo -n 'fooo' | strings fooo $ echo -n 'fooo' | ./toybox strings fooo$ # note the missing newline > > As for the offset+1 bit... > > $ ./strings -o strings | head -n 3 > 567 /lib64/ld-linux-x86-64.so.2 > 646 g"X7 > 1600 libc.so.6 > $ strings -o strings | head -n 3 > 1070 /lib64/ld-linux-x86-64.so.2 > 1207 g"X7 > 3101 libc.so.6 > > I don't see how +1 addresses that? (Or what exactly is going on there.) gnu strings use octal offsets by default toybox strings use decimal, but one-off: $ echo fooo | ./toybox strings -o -1 fooo > > Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
Re: [Toybox] strings tests and bugfixes
On 05/28/2017 11:25 PM, Ilya Kuzmich wrote: > Found and fixed some strings bugs. duh. Sorry for the delay, I started a new Japanese class at the local community college (one of those compressed summer session things) and between that and $DAYJOB everything else is mostly just on weekends right now. :) I glanced at this one to see if it was trivial to apply but: -if (nread < 1) break; +if (nread < 1) { + if (count == wlen) xputc('\n'); + break; +} We have \n at the end of strings already? Under what circumstances would it _not_ output them? make strings $ diff -u <(./strings strings) <(strings strings) | wc 0 0 0 As for the offset+1 bit... $ ./strings -o strings | head -n 3 567 /lib64/ld-linux-x86-64.so.2 646 g"X7 1600 libc.so.6 $ strings -o strings | head -n 3 1070 /lib64/ld-linux-x86-64.so.2 1207 g"X7 3101 libc.so.6 I don't see how +1 addresses that? (Or what exactly is going on there.) Rob ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net
[Toybox] strings tests and bugfixes
Found and fixed some strings bugs. duh. >From 2a5b48249df851fbab0cb7af3d0f5d45bf2aee37 Mon Sep 17 00:00:00 2001 From: Ilya KuzmichDate: Mon, 29 May 2017 07:05:16 +0300 Subject: [PATCH] strings tests and bugfixes Fixes missing newline in output if last byte of the input is string. Fixes one-off offset bug. Adds strings tests. Signed-off-by: Ilya Kuzmich --- tests/strings.test | 20 toys/posix/strings.c | 7 +-- 2 files changed, 25 insertions(+), 2 deletions(-) create mode 100644 tests/strings.test diff --git a/tests/strings.test b/tests/strings.test new file mode 100644 index 000..8becc2f --- /dev/null +++ b/tests/strings.test @@ -0,0 +1,20 @@ +#!/bin/bash + +[ -f testing.sh ] && . testing.sh + +#testing "name" "command" "result" "infile" "stdin" + +testing "stdin" "strings" "foobar\n" "" "foobar\n" +testing "file" "strings input" "foobar\n" "foobar\n" "" +testing "string to the end" "strings input" "foobar\n" "foobar" "" +testing "short strings" "strings input" "" "foo\nbar\n" "" +testing "-n6" "strings -n6 input" "foobar\n" "foobar\nbaz\n" "" +testing "string and nulls" "strings input" "foobar\nbazfoo\n" \ + "\0foobar\0\0bazfoo\0foo" "" +testing "-f" "strings -f input" "input: foobar\n" "foobar\n" "" +testing "-o" "strings -o input | sed 's/^ *//'" "6 foobar\n" \ + "\0\0\0\0\0\0foobar\n" "" +testing "-o, multiple strings" "strings -n3 -o input | sed 's/^ *//'" \ + "1 foo\n7 bar\n" "\0foo\0b\0bar\n" "" +testing "-fo" "strings -fo input | sed 's/: */: /'" "input: 6 foobar\n" \ + "\0\0\0\0\0\0foobar\n" "" diff --git a/toys/posix/strings.c b/toys/posix/strings.c index f3ce70c..d2154e7 100644 --- a/toys/posix/strings.c +++ b/toys/posix/strings.c @@ -38,7 +38,10 @@ static void do_strings(int fd, char *filename) for (;;) { nread = read(fd, toybuf, sizeof(toybuf)); if (nread < 0) perror_msg_raw(filename); -if (nread < 1) break; +if (nread < 1) { + if (count == wlen) xputc('\n'); + break; +} for (i = 0; i < nread; i++, offset++) { if (((toybuf[i] >= 32) && (toybuf[i] <= 126)) || (toybuf[i] == '\t')) { if (count == wlen) fputc(toybuf[i], stdout); @@ -47,7 +50,7 @@ static void do_strings(int fd, char *filename) if (count == wlen) { if (toys.optflags & FLAG_f) printf("%s: ", filename); if (toys.optflags & FLAG_o) - printf("%7lld ",(long long)(offset - wlen)); + printf("%7lld ",(long long)(offset + 1 - wlen)); printf("%s", string); } } -- 2.7.4 ___ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net