Re: [Toybox] strings tests and bugfixes

2017-07-03 Thread Rob Landley
On 06/14/2017 08:04 AM, Ilya Kuzmich wrote:
> ping?

Sorry again for the delay! (I have so many open tabs. Trying to
close a few.)

>>> I glanced at this one to see if it was trivial to apply but:
>>>
>>> -if (nread < 1) break;
>>> +if (nread < 1) {
>>> +  if (count == wlen) xputc('\n');
>>> +  break;
>>> +}
>>>
>>> We have \n at the end of strings already? Under what circumstances would
>>> it _not_ output them?
>>>
>>>   make strings
>>>   $ diff -u <(./strings strings) <(strings strings) | wc
>>> 0   0   0
>> current implementation doesn't output newline if last byte does matches:
>> if (((toybuf[i] >= 32) && (toybuf[i] <= 126)) || (toybuf[i] == '\t')) {

Which violates posix:

  http://pubs.opengroup.org/onlinepubs/9699919799/utilities/strings.html

  A printable string is any sequence of four (by default) or more
  printable characters terminated by a  or NUL character.

And which never happens in a file conforming to the ELF spec, which uses
NUL. (The purpose of this command is to show strings "in a binary file",
which originally meant executable format du jour. The current one in
ubuntu seems to default to "-a" which shows all printable text, but it's
still got the plumbing to parse ELF sections if you ask it to because
that was the original purpose. I vaguely recall their ELF traversal had
something stupidly exploitable and they switched to -a because of it?
Not online to look it up right now...)

That said, _we_ violate posix here because I'm not reading ahead and
buffering unlimited data before displaying the output (just the -n
amount, gneerally 4 bytes), which means I'm not enforcing the
nul/newline check because the really simple algorithm I'm doing just
treats end of string as data that's not printable.

Except it's not so much violation as "historical usage" according to the
"Application Usage" section of posix:

  By default the data area (as opposed to the text, "bss", or header
  areas) of a binary executable file is scanned. Implementations
  document which areas are scanned.

  Some historical implementations do not require NUL or 
  terminators for strings to permit those languages that do not use NUL
  as a string terminator to have their strings written.

You have a point that if we start printing the string, we might as well
finish printing it with the newline. But reading ahead and confirming we
have a newline or NULL would cut down on the 4-byte false positives.
(I'm tempted to do it _just_ for the first byte after what we've
buffered, if that's out of spec but not NUL or \n don't print. But
that's of of those "easy to implement but hard to document" things that
make the command _conceptually_ hard to understand. Blah.)

Once again, if somebody ELSE had come to me and pointed out the
deviation I'd probably take it seriously, but if it's been in the code
for 3 years and I'm the first person to notice...

>>> As for the offset+1 bit...
>>>
>>>   $ ./strings -o strings | head -n 3
>>>   567 /lib64/ld-linux-x86-64.so.2
>>>   646 g"X7
>>>  1600 libc.so.6
>>>   $ strings -o strings | head -n 3
>>>  1070 /lib64/ld-linux-x86-64.so.2
>>>  1207 g"X7
>>>  3101 libc.so.6

Posix hasn't got -o. It has "-t d" instead. See the "Rationale" section
for why they changed it. (Basically the octal vs decimal issue.)

I have a todo item about implementing posix -t, but I was waiting for
somebody to need it and you still haven't asked about that, but I might
as well do it while I'm looking at it now...

Sigh, and how much do I care about my other todo item (utf8 support)?
Ubuntu 14.04 doesn't:

  $ strings -a tests/files/utf8/japan.txt
  $

That's probably one of the big reasons I'm the first person to notice
problems with the command. It's verging on obsolete...

>>> I don't see how +1 addresses that? (Or what exactly is going on there.)
>> gnu strings use octal offsets by default

And I used decimal because the strings man page says:

  -o  Like -t o.  Some other versions of strings have -o act like -t d
  instead.  Since we can not be compatible with both ways, we simply
  chose one.

Seemed like something I could choose, so I did.

The vestigial octal in unix (file permissions and such) is left over
from its initial implementation on a pdp-7, which was an 18 bit machine.
(I.E. 3*6, each word was 3 octal bytes.) They switched to 8 bit bytes
when they ported it to the pdp-11 (up and running in March 1971
according to https://www.bell-labs.com/usr/dmr/www/notes.html). Binutils
doing new octal after that was just silly (gcc 1.0 came out in 1987, 16
years after octal output made sense).

That said, busybox mindlessly copied binutils here. Does something
actually use this?

>> toybox strings use decimal, but one-off:

Hmmm... hexdump -c agrees that the offset isn't where the data starts.
That's a bug.

>> $ echo fooo | ./toybox strings -o
>>  -1 fooo

Alright, let's look at the patch again. Yay new test suite entries, the
special case 

Re: [Toybox] strings tests and bugfixes

2017-06-14 Thread Ilya Kuzmich
ping?
On 01/06, Ilya Kuzmich wrote:
> On 01/06, Rob Landley wrote:
> > On 05/28/2017 11:25 PM, Ilya Kuzmich wrote:
> > > Found and fixed some strings bugs. duh.
> > 
> > Sorry for the delay, I started a new Japanese class at the local
> > community college (one of those compressed summer session things) and
> > between that and $DAYJOB everything else is mostly just on weekends
> > right now. :)
> > 
> > I glanced at this one to see if it was trivial to apply but:
> > 
> > -if (nread < 1) break;
> > +if (nread < 1) {
> > +  if (count == wlen) xputc('\n');
> > +  break;
> > +}
> > 
> > We have \n at the end of strings already? Under what circumstances would
> > it _not_ output them?
> > 
> >   make strings
> >   $ diff -u <(./strings strings) <(strings strings) | wc
> > 0   0   0
> current implementation doesn't output newline if last byte does matches:
> if (((toybuf[i] >= 32) && (toybuf[i] <= 126)) || (toybuf[i] == '\t')) {
> 
> for example:
> 
> $ echo -n 'fooo' | strings
> fooo
> $ echo -n 'fooo' | ./toybox strings
> fooo$
> # note the missing newline
> 
> > 
> > As for the offset+1 bit...
> > 
> >   $ ./strings -o strings | head -n 3
> >   567 /lib64/ld-linux-x86-64.so.2
> >   646 g"X7
> >  1600 libc.so.6
> >   $ strings -o strings | head -n 3
> >  1070 /lib64/ld-linux-x86-64.so.2
> >  1207 g"X7
> >  3101 libc.so.6
> > 
> > I don't see how +1 addresses that? (Or what exactly is going on there.)
> gnu strings use octal offsets by default
> toybox strings use decimal, but one-off:
> 
> $ echo fooo | ./toybox strings -o
>  -1 fooo
> 
> > 
> > Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] strings tests and bugfixes

2017-06-01 Thread Ilya Kuzmich
On 01/06, Rob Landley wrote:
> On 05/28/2017 11:25 PM, Ilya Kuzmich wrote:
> > Found and fixed some strings bugs. duh.
> 
> Sorry for the delay, I started a new Japanese class at the local
> community college (one of those compressed summer session things) and
> between that and $DAYJOB everything else is mostly just on weekends
> right now. :)
> 
> I glanced at this one to see if it was trivial to apply but:
> 
> -if (nread < 1) break;
> +if (nread < 1) {
> +  if (count == wlen) xputc('\n');
> +  break;
> +}
> 
> We have \n at the end of strings already? Under what circumstances would
> it _not_ output them?
> 
>   make strings
>   $ diff -u <(./strings strings) <(strings strings) | wc
>   0   0   0
current implementation doesn't output newline if last byte does matches:
if (((toybuf[i] >= 32) && (toybuf[i] <= 126)) || (toybuf[i] == '\t')) {

for example:

$ echo -n 'fooo' | strings
fooo
$ echo -n 'fooo' | ./toybox strings
fooo$
# note the missing newline

> 
> As for the offset+1 bit...
> 
>   $ ./strings -o strings | head -n 3
>   567 /lib64/ld-linux-x86-64.so.2
>   646 g"X7
>  1600 libc.so.6
>   $ strings -o strings | head -n 3
>  1070 /lib64/ld-linux-x86-64.so.2
>  1207 g"X7
>  3101 libc.so.6
> 
> I don't see how +1 addresses that? (Or what exactly is going on there.)
gnu strings use octal offsets by default
toybox strings use decimal, but one-off:

$ echo fooo | ./toybox strings -o
 -1 fooo

> 
> Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] strings tests and bugfixes

2017-06-01 Thread Rob Landley
On 05/28/2017 11:25 PM, Ilya Kuzmich wrote:
> Found and fixed some strings bugs. duh.

Sorry for the delay, I started a new Japanese class at the local
community college (one of those compressed summer session things) and
between that and $DAYJOB everything else is mostly just on weekends
right now. :)

I glanced at this one to see if it was trivial to apply but:

-if (nread < 1) break;
+if (nread < 1) {
+  if (count == wlen) xputc('\n');
+  break;
+}

We have \n at the end of strings already? Under what circumstances would
it _not_ output them?

  make strings
  $ diff -u <(./strings strings) <(strings strings) | wc
0   0   0

As for the offset+1 bit...

  $ ./strings -o strings | head -n 3
  567 /lib64/ld-linux-x86-64.so.2
  646 g"X7
 1600 libc.so.6
  $ strings -o strings | head -n 3
 1070 /lib64/ld-linux-x86-64.so.2
 1207 g"X7
 3101 libc.so.6

I don't see how +1 addresses that? (Or what exactly is going on there.)

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


[Toybox] strings tests and bugfixes

2017-05-28 Thread Ilya Kuzmich
Found and fixed some strings bugs. duh.
>From 2a5b48249df851fbab0cb7af3d0f5d45bf2aee37 Mon Sep 17 00:00:00 2001
From: Ilya Kuzmich 
Date: Mon, 29 May 2017 07:05:16 +0300
Subject: [PATCH] strings tests and  bugfixes

Fixes missing newline in output if last byte of the input is string.
Fixes one-off offset bug.
Adds strings tests.

Signed-off-by: Ilya Kuzmich 
---
 tests/strings.test   | 20 
 toys/posix/strings.c |  7 +--
 2 files changed, 25 insertions(+), 2 deletions(-)
 create mode 100644 tests/strings.test

diff --git a/tests/strings.test b/tests/strings.test
new file mode 100644
index 000..8becc2f
--- /dev/null
+++ b/tests/strings.test
@@ -0,0 +1,20 @@
+#!/bin/bash
+
+[ -f testing.sh ] && . testing.sh
+
+#testing "name" "command" "result" "infile" "stdin"
+
+testing "stdin" "strings" "foobar\n" "" "foobar\n"
+testing "file" "strings input" "foobar\n" "foobar\n" ""
+testing "string to the end" "strings input" "foobar\n" "foobar" ""
+testing "short strings" "strings input" "" "foo\nbar\n" ""
+testing "-n6" "strings -n6 input" "foobar\n" "foobar\nbaz\n" ""
+testing "string and nulls" "strings input" "foobar\nbazfoo\n" \
+	"\0foobar\0\0bazfoo\0foo" ""
+testing "-f" "strings -f input" "input: foobar\n" "foobar\n" ""
+testing "-o" "strings -o input | sed 's/^ *//'" "6 foobar\n" \
+	"\0\0\0\0\0\0foobar\n" ""
+testing "-o, multiple strings" "strings -n3 -o input | sed 's/^ *//'" \
+	"1 foo\n7 bar\n" "\0foo\0b\0bar\n" ""
+testing "-fo" "strings -fo input | sed 's/: */: /'" "input: 6 foobar\n" \
+	"\0\0\0\0\0\0foobar\n" ""
diff --git a/toys/posix/strings.c b/toys/posix/strings.c
index f3ce70c..d2154e7 100644
--- a/toys/posix/strings.c
+++ b/toys/posix/strings.c
@@ -38,7 +38,10 @@ static void do_strings(int fd, char *filename)
   for (;;) {
 nread = read(fd, toybuf, sizeof(toybuf));
 if (nread < 0) perror_msg_raw(filename);
-if (nread < 1) break;
+if (nread < 1) {
+  if (count == wlen) xputc('\n');
+  break;
+}
 for (i = 0; i < nread; i++, offset++) {
   if (((toybuf[i] >= 32) && (toybuf[i] <= 126)) || (toybuf[i] == '\t')) {
 if (count == wlen) fputc(toybuf[i], stdout);
@@ -47,7 +50,7 @@ static void do_strings(int fd, char *filename)
   if (count == wlen) {
 if (toys.optflags & FLAG_f) printf("%s: ", filename);
 if (toys.optflags & FLAG_o)
-  printf("%7lld ",(long long)(offset - wlen));
+  printf("%7lld ",(long long)(offset + 1 - wlen));
 printf("%s", string);
   }
 }
-- 
2.7.4

___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net