On Sat, 22 Jul 2023 at 21:29, tito <[email protected]> wrote:
>
> On Sat, 22 Jul 2023 19:31:28 +0200
> "Roberto A. Foglietta" <[email protected]> wrote:
>
> > On Sat, 22 Jul 2023 at 15:40, tito <[email protected]> wrote:
> >
> > > Hi,
> > >
> > > I'm not the maintainer so I can say nothing about integration,
> > > I can just point out things that look strange to me and my limited 
> > > knowledge.
> > > When I read that this code is faster vs other code as I'm a curious
> > > person I just try to see how much faster it is and why as there
> > > is always something to learn on the busybox mailing list.
> > > If in my little tests it is not faster then I think I'm entitled
> > > to ask questions about it as science results should be reproducible.
> > >
> > > For simple benchmarking maybe reading a big enough file
> > > into memory and feeding it to strings in a few 1000 iterations
> > > should do to avoid bias from hdd/sdd and system load, one shot shows:
> > >
> > > ramtmp="$(mktemp -p /dev/shm/)"
> > >  dd if=vmlinux.o of=$ramtmp
> > > echo $ramtmp
> > > /dev/shm/tmp.ll3G2kzKE1
> > >
> > > 1) coreutils strings
> > > time  strings $ramtmp > /dev/null
> >
> > This is not correct because you are reading a file in tmpfs while the
>
> Yes, this was exactly the purpose of the test to eliminate all
> factors connected to underlying block devices and time
> the speed of code of the different implementations.
>

Which is wrong because you did a hypothesis which is far away from the
typical usage and in some cases you can even use it because strings
over a 4GB ISO image would not necessarily fit into a tmpfs in every
system. Abstract benchmarks can be funny but do not depict/measure the
reality as usual. Extending this logic, we can trash the Ohm law
because we can reach in the laboratory a near zero temperature!

> >
> > Lines particularly long, more than 4096 characters are divided into
> > blocks with \n. It is clearly a corner case for which \n should be
>
> It was 35 corner cases in a handful of files due to a arbitrary hard-coded 
> limitation.
> You should maybe run it on `find /`

Which can be solved quite easily with a cast of a bool:

bool pr = isPrintable(*ch);

use this bool instead of the check into the code and change this one

#define print_text(p,b,c) if(p-b >= 4) { *p++ = 0; printf("%s%c",b,c); }

print_text(p, buffer, pr ? *ch : '\n'); // print collected text

in such a way it prints the current char rather than the new line.
After this change the average of 2x speed has been maintained.

> >
> > > I suspect this could be a problem for integration  and also size of code 
> > > after integration is relevant.
> >

The size can be reduced using two buffers only without losing
performances because three are redundant. Which is the current size of
the strings applet? Just to have an idea because the size of a single
binary with main() et company cannot immediately be compared with a
busybox applet. For sure is a lot more smaller than the strings which
also require a large shared library:

size /usr/bin/strings
   text    data     bss     dec     hex filename
  20943    1472      64   22479    57cf /usr/bin/strings

ldd /usr/bin/strings
linux-vdso.so.1 (0x00007ffde4fbc000)
libbfd-2.38-system.so => /lib/x86_64-linux-gnu/libbfd-2.38-system.so
(0x00007f6d64cf1000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6d64ac9000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f6d64aad000)
/lib64/ld-linux-x86-64.so.2 (0x00007f6d64e8c000)

size /lib/x86_64-linux-gnu/libbfd-2.38-system.so
   text    data     bss     dec     hex filename
1434786   94000     680 1529466 17567a
/lib/x86_64-linux-gnu/libbfd-2.38-system.so

> > It is a corner case that could be addressed. I did not check the size
> > of strings in busybox. However, once confirmed that the size is more
> > important than the speed for busybox - I agree on this - then it can
> > be proposed to binutils (or coreutils) depending on which package is
> > included. I found the binary version for aarch64 on binutils, AFAIR.
>
> I wonder why should they be wanting to change their stable code for
> a new implementation?

Because It is very easy to check that it works, it is 2x faster on
average and on a fine-tuned system can reach 4x, the size drops
dramatically considering to free the binary from the large shared
library.

In attachment the new version with the test suite and the benchmark
suite in the header. The benchmark suite did not change with respect
to the script file I just sent.

Best regards, R-
/*
 * (C) 2023, Roberto A. Foglietta <[email protected]>
 *           Released under the GPLv2 license terms.
 *
 * This is a rework of the original source code in public domain which is here:
 *
 * https://stackoverflow.com/questions/51389969/\
 *      implementing-my-own-strings-tool-missing-sequences-gnu-strings-finds
 *
*** HOW TO COMPILE *************************************************************
 
 gcc -Wall -O3 strings.c -o strings && strip strings
 
*** HOW TO TEST ****************************************************************
 
 for i in $(ls -1 /usr/bin/); do if test -f $i; then strings $i > out1.txt; 
     ./strings $i > out2.txt; diff -q out1.txt out2.txt || break; fi; done
 diff -q out1.txt out2.txt || { echo file: $i; xxdiff out1.txt out2.txt; }
 
*** PERFORMANCES ***************************************************************

 gcc -Wall -O3 strings.orig.c -o strings && strip strings && rm -f [12].txt

 time   strings /usr/bin/busybox >1.txt
 real 0m0.035s
 time ./strings /usr/bin/busybox >2.txt
 real 0m1.843s
 
 gcc -Wall -O3 strings.c -o strings && strip strings && rm -f [12].txt

 time   strings /usr/bin/busybox >1.txt
 real 0m0.033s
 time ./strings /usr/bin/busybox >2.txt
 real 0m0.012s

*** FOOTPRINT ****************************************************************** 

 gcc -Wall -O3 strings.c -o strings && strip strings && size strings
 
 size ./strings # USE_MALLOC=0 on amd64 no change in execution time
  text	   data	    bss	    dec	    hex	filename
  2965	    664	     48	   3677	    e5d	./strings

 size ./strings # USE_MALLOC=1 on amd64 no change in execution time
  text	   data	    bss	    dec	    hex	filename
  2993	    672	     48	   3713	    e81	./strings

 gcc -Wall -Os strings.c -o strings && strip strings && size strings

 size ./strings # USE_MALLOC=0 on amd64 no change in execution time
  text	   data	    bss	    dec	    hex	filename
  2865	    664	     48	   3577	    df9	./strings

 size ./strings # USE_MALLOC=1 on amd64 no change in execution time
  text	   data	    bss	    dec	    hex	filename
  2929	    672	     48	   3649	    e41	./strings

*** BENCHMARK SUITE ************************************************************

#!/bin/bash

export finput="/boot/grub/unicode.pf2" cdrop=disabled

cachedrop() {
    if [ "$cdrop" = "enabled" ]; then
		sync; echo 3 | sudo tee /proc/sys/vm/drop_caches >/dev/null
	fi
    return 0
}

stats() {
    local tmpf=$(mktemp -p "${TMPDIR:-/tmp}" -t time.XXXX) n=${2:-100}
    local cmd=${1:-$(which busybox) strings $finput} m=50

    if [ "$n" != "100" ]; then m=$(( (n+1)/2 )); fi
    for i in $(seq 1 $n); do cachedrop; eval time $cmd; done 2>$tmpf

    {
    echo
    echo "$cmd ${3:-}"
    sed -ne "s,real\t,min: ,p" $tmpf | sort -n | head -n1
    let avg=$(sed -ne "s,real\t0m0.[0]*\([0-9]*\)s,\\1,p" $tmpf | tr '\n' '+')0
    printf "avg: 0m0.%03ds\n" $(( (m+avg)/n ))
    sed -ne "s,real\t,max: ,p" $tmpf | sort -n | tail -n1
    } >&2

    rm -f $tmpf
}

benchmark() {
	local statf=${1:-2.txt} bbcmd=$(which busybox) fname="$finput"

    rm -f $statf; $bbcmd strings $bbcmd >/dev/null     # just to fill the cache
	cachedrop                                          # and drop it
	stats "$bbcmd strings $fname" 100 >/dev/null 2>&1  # then unleash the CPU

    rm -f 1.txt; cmd="$bbcmd strings $fname";
    { stats "$cmd" 100 "term";
      stats "$cmd" 100 "null" >/dev/null;
      stats "$cmd" 100 "file" >1.txt; } 2>>$statf

	if [ "$cdrop" != "enabled" ]; then
		rm -f 1.txt; cmd="cat $fname | $bbcmd strings";
		{ stats "$cmd" 100 "term";
		  stats "$cmd" 100 "null ">/dev/null;
		  stats "$cmd" 100 "file ">1.txt; } 2>>$statf
    fi

    rm -f 1.txt; cmd="./strings $fname";
    { stats "$cmd" 100 "term";
      stats "$cmd" 100 "null" >/dev/null;
      stats "$cmd" 100 "file" >1.txt; } 2>>$statf

	if [ "$cdrop" != "enabled" ]; then
		rm -f 1.txt; cmd="cat $fname | ./strings";
		{ stats "$cmd" 100 "term";
		  stats "$cmd" 100 "null" >/dev/null;
		  stats "$cmd" 100 "file" >1.txt; } 2>>$statf
	fi

    clear; more $statf; echo -e "\nstats file: $statf\n"
}

benchmark stats.txt
 
*******************************************************************************/
 
#define USE_MALLOC 0

#include <stdio.h>
#if USE_MALLOC
#include <malloc.h>
#endif
#include <stdbool.h>
#include <unistd.h>
#include <fcntl.h>

#define isPrintable(c) ((c) == 0x09 || ((c) > 0x1f && (c) < 0x7f))

#define print_text(p,b,c) if(p-b >= 4) { *p++ = 0; printf("%s%c",b,c); }

#define BUFSIZE 4096 //RAF: memory page typical size

int main(int argc, char * argv [])
{
#if USE_MALLOC
    char *p, *buffer, *stdout_buffer, *file_buffer;
#else
    char buffer[4096], stdout_buffer[4096], file_buffer[4096];
    char *p = buffer;
#endif
    int n, fd = -1;

    if(argv[1] && !argv[1][0])
    {
		fprintf(stderr, "Usage: %s file\n", argv[0]);
		return 1;
    } 
    //RAF: nice to have '-' but it is not compatible with binutils strings
    else if(argc < 2 /*|| (argv[1] && argv[1][0] == '-')*/)
    {
		fd = fileno(stdin);
    }
    
    if(fd == -1)
    {
		fd = open(argv[1], O_RDONLY);
		if(fd < 0) {
			fprintf(stderr, "Could not open %s\n", argv[1]);
			return 1;
		}
    }

#if USE_MALLOC
    buffer = malloc(BUFSIZE*3);
    p = buffer;
    if(!p) {
	    fprintf(stderr, "Could not malloc %d x 3\n", 4096);
	    close(fd);
		return 1;
    }
	stdout_buffer = &p[BUFSIZE];
	file_buffer = &p[BUFSIZE*2];
#endif
    
    setvbuf(stdout, stdout_buffer, _IOFBF, BUFSIZE);
    
    while(1)
    {
    	char *ch = file_buffer;
    	
		n = read(fd, file_buffer, BUFSIZE);
		if(n <= 0)
			break;

		while(n-- > 0)
		{
			bool pr = isPrintable(*ch);
			
		    if(pr && (p-buffer < BUFSIZE-5))
		    {
		    	*p++ = *ch;
		    }
		    else
		    {
			    print_text(p, buffer, pr ? *ch : '\n'); // print collected text
		        p = buffer;
		    }
		    ch++;
		}
	}
	print_text(p, buffer, '\n'); // print the rest, if any

    fflush(stdout);
#if USE_MALLOC
    free(buffer);
#endif
    close(fd);

    return 0;
}
_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox

Reply via email to