On Sun, 23 Jul 2023 at 13:18, tito <farmat...@tiscali.it> wrote:
>
> On Sun, 23 Jul 2023 12:00:56 +0200
> "Roberto A. Foglietta" <roberto.foglie...@gmail.com> wrote:

> > >
> > > 1) multiple file handling (a must i would dare to say)
> >
> > Which is not such a problem, after all
> >
> > for i in "$@"; do simply-strings "$i" | sed -e "s/^/$i:/"; done
> >
> > the sed will include also the file name in front of the string which
> > is useful for grepping. However, the single-file limitation brings to
> > personalize the approach:
> >
> > for i in "$@"; do simply-strings "$i" | grep -w "word" && break; done; echo 
> > $i
>
> Don't cheat, this change would break other people's scripts.

Other people are not anymore into the scene, since the moment that we
established that reinventing the wheel is not efficient nor useful.


>
> > Yes, strings has a lot of options and also busybox have several
> > options. This is the best critic about proceeding with an integration.
> > I will check if I can put an optimization into bb strings, just for my
> > own curiosity.
>
> This would be far better than reinventing the wheel.
>

Reinventing the wheel is a good way to understand how the wheel works
and improve it. We just concluded that there is no reason to reinvent
the wheel completely. However, the simple-strings can be useful when
its deployment fixes fulfill a void better than replacing a
fundamental system component like busybox which can break future OTA.
In particular, it is fine as a service/rescue/recovery image in which
the space is limited and the full compatibility with strings or
busybox strings is not necessary and for everything else custom
scripts can easily compensate.

About improving busybox strings and more in general its printf
performance, it is about this:

setvbuf(stdout, (char *)stdout_buffer, _IOFBF, BUFSIZE);

Obviously a large static buffer can impact the footprint but as long
as malloc() is used into the busybox - and in its library I remember
there were sanitising wrappers for it - then it would not be such a
big deal to use a dynamically allocated buffer. The tricky aspect is
about the applet forking. A topic that I do not know but I saw an
option "no fork" in the config. I did not even start to see the code,
therefore I am just wondering about.


> >
> > > 3) output compatible with original gnu strings
> > >
> > > > In attachment the new version with the test suite and the benchmark
> > > > suite in the header. The benchmark suite did not change with respect
> > > > to the script file I just sent.
> > > >
> > > > Best regards, R-
> > >
> > > BTW: there still seem to be corner-cases:
> > > list=`find /usr`
> > > for i in $list; do if test -f $i; then  ./strings $i > out1.txt; strings 
> > > $i > out2.txt; diff -q out1.txt out2.txt; fi; done
> > > Files out1.txt and out2.txt differ
> > > Files out1.txt and out2.txt differ
> > > Files out1.txt and out2.txt differ
> > > Files out1.txt and out2.txt differ
> > >
> > > test is still running....
> >
> > ok, I will do a run. Can you please echo the finenames, instead?
> >
> > for i in $list; do if test -f $i; then  ./strings $i > out1.txt;
> > strings $i > out2.txt; diff -q out1.txt out2.txt >/dev/null || echo
> > $i; fi; done
> >

The version in attachment also solves the rest of the problem that my
/usr could have raised with the previous version.  Moreover, I have
further developed the benchmark and the testing suites. You might find
interesting the new part of the benchmark suite about 'dd' used as an
alternative of /dev/null for giving us a transfer speed. As you can
see, if you wish to do strings on tmpfs then for each different file
you need to copy it into the tmpfs. For this reason, copying in tmpfs
+ 100 strings run on the same file is like cheating <-- you started!
;-)


>
> if you hire me as beta tester....at least you own me a beer if we ever met in 
> person.
>

Sure, you are welcome. I live in Genoa, at the moment - you can easily
find my mobile telephone number by googling my name (well, to be
precise: it is a brand strongly based on my name).

In another context, I saw that there is the policy of paying by paypal
& co. a small amount of money IMHO, it is a very bad marketing policy
which seriously impair the value of a professionist. However, when
someone acts outside its professional sector like - blogging,
zero-hope commercial projects, end-users guides, et similia - then it
is fine to ask, IMHO. As long as it is clear what someone asks.

More in general, my common attitude is to raise and save money to
start my own company and pay people to work for/with me. But everytime
my incoming or my company business is going well, the people around me
go mad and f*ck-up everything without any reasonable way to stop them.
Now, I got quita a clear picture about it but this is definitely
off-topic.

Cheers, R-
/*
 * (C) 2023, Roberto A. Foglietta <roberto.foglie...@gmail.com>
 *           Released under the GPLv2 license terms.
 *
 * This is a rework of the original source code in public domain which is here:
 *
 * https://stackoverflow.com/questions/51389969/\
 *      implementing-my-own-strings-tool-missing-sequences-gnu-strings-finds
 *
*** HOW TO COMPILE *************************************************************
 
 gcc -Wall -O3 strings.c -o strings && strip strings
 
*** HOW TO TEST ****************************************************************

#!/bin/bash
#
# (C) 2023, Roberto A. Foglietta <roberto.foglie...@gmail.com>
#           Released under the GPLv2 license terms.
#
#!/bin/bash

gcc -Wall -Werror -O3 strings.c -o strings ||\
	exit 1 && strip strings && size strings

bb="busybox"; if ! echo "Using ${bb:+$bb }strings" | $bb strings 2>/dev/null |\
	grep .; then bb=''; fi;

list=${1:-$(find /usr/ -type f | grep -v ' ')}

out[1]='/tmp/out1.txt'
out[2]='/tmp/out2.txt'

time {
	for i in $list; do
		$bb strings $i   >${out[1]}
		  ./strings $i   >${out[2]}
		diff -q ${out[1]} ${out[2]} || break
	done
}

diff -pruN ${out[1]} ${out[2]} || { echo file: $i; xxdiff ${out[1]} ${out[2]}; }
 
*** PERFORMANCES ***************************************************************

 gcc -Wall -O3 strings.orig.c -o strings && strip strings && rm -f [12].txt

 time   strings /usr/bin/busybox >1.txt
 real 0m0.035s
 time ./strings /usr/bin/busybox >2.txt
 real 0m1.843s
 
 gcc -Wall -O3 strings.c -o strings && strip strings && rm -f [12].txt

 time   strings /usr/bin/busybox >1.txt
 real 0m0.033s
 time ./strings /usr/bin/busybox >2.txt
 real 0m0.012s

*** FOOTPRINT ****************************************************************** 

 gcc -Wall -O3 strings.c -o strings && strip strings && size ./strings
 
 size ./strings # USE_MALLOC=0 on amd64 no change in execution time
  text	   data	    bss	    dec	    hex	filename
  3050	    672	     48	   3770	    eba	./strings

 size ./strings # USE_MALLOC=1 on amd64 no change in execution time
  text	   data	    bss	    dec	    hex	filename
  3094	    680	     48	   3822	    eee	./strings

 gcc -Wall -Os strings.c -o strings && strip strings && size ./strings

 size ./strings # USE_MALLOC=0 on amd64 no change in execution time
  text	   data	    bss	    dec	    hex	filename
  2966	    672	     48	   3686	    e66	./strings

 size ./strings # USE_MALLOC=1 on amd64 no change in execution time
  text	   data	    bss	    dec	    hex	filename
  3046	    680	     48	   3774	    ebe	./string

*** BENCHMARK SUITE ************************************************************

#!/bin/bash
#
# (C) 2023, Roberto A. Foglietta <roberto.foglie...@gmail.com>
#           Released under the GPLv2 license terms.
#
set -m

export finput="${finput:-$(ls -1 '/usr/lib/'*'/libc.so.6' | head -n1)}"
export cdrop="${cdrop:-0}" # or 1:enabled
export tmpfs="${tmpfs:-0}" # or 1:enabled
export statf="stats.txt" tmpout="./1.txt"

tmpout_sync() { :; }

if [ ! -e "$finput" ]; then
	echo "ERROR: file '$finput' does not exist, set finput and retry."
	exit 1
fi

if [ "$(whoami)" != "root" ]; then
	echo "ERROR: this script needs to be executed by root, abort."
	exit 1
fi

cachedrop() {
    if [ "$cdrop" = "1" ]; then
		sync; echo 3 | sudo tee /proc/sys/vm/drop_caches >/dev/null
	fi
    return 0
}

stats() {
    local tmpf=$(mktemp -p "${TMPDIR:-/tmp}" -t time.XXXX) n=${2:-100}
    local cmd=${1:-$(which busybox) strings $finput} m=50

    if [ "$n" != "100" ]; then m=$(( (n+1)/2 )); fi
    for i in $(seq 1 $n); do
		cachedrop; time { eval $cmd; tmpout_sync; }
    done 2>$tmpf

    {
    echo
    echo "$cmd ${3:-} with tmpfs=$tmpfs"
    sed -ne "s,real\t,min: ,p" $tmpf | sort -n | head -n1
    let avg=$(sed -ne "s,real\t0m0.[0]*\([0-9]*\)s,\\1,p" $tmpf | tr '\n' '+')0
    printf "avg: 0m0.%03ds\n" $(( (m+avg)/n ))
    sed -ne "s,real\t,max: ,p" $tmpf | sort -n | tail -n1
    } >&2

    rm -f $tmpf
}

benchmark() {
	local bbcmd=$(which busybox) fname="$finput"

    $bbcmd strings $bbcmd >/dev/null                   # just to fill the cache
	cachedrop                                          # and drop it
	stats "$bbcmd strings $fname" 100 >/dev/null 2>&1  # then unleash the CPU

    rm -f $tmpout
    cmdlist=""

	for bin in './' ${bbcmd:+"$bbcmd "} ''; do
		stats "${bin}strings $fname" 100 "term";
	done
	for bin in './' ${bbcmd:+"$bbcmd "} ''; do
		stats "${bin}strings $fname" 100 "null" >/dev/null;
	done
	for bin in './' ${bbcmd:+"$bbcmd "} ''; do
		stats "${bin}strings $fname" 100 "file" >$tmpout; rm -f $tmpout
	done

	test "$cdrop" = "1" && return 0;

	for bin in './' ${bbcmd:+"$bbcmd "} ''; do
		stats "cat $fname | ${bin}strings" 100 "term";
	done
	for bin in './' ${bbcmd:+"$bbcmd "} ''; do
		stats "cat $fname | ${bin}strings" 100 "null">/dev/null;
	done
	for bin in './' ${bbcmd:+"$bbcmd "} ''; do
		stats "cat $fname | ${bin}strings" 100 "file">$tmpout; rm -f $tmpout
	done
}

if [ "$tmpfs" = "1" ]; then
	tmpdir=/tmp/tmpfs
	tmpout=$tmpdir/1.str
	mkdir -p "$tmpdir";
	if ! mount -t tmpfs tmpfs "$tmpdir/"; then
		echo -e "\nERROR: could not mount tmpfs in '$tmpdir', abort.\n"
		exit 1
	fi
	trap "rm -f '$tmpout'; umount -l '$tmpdir'; rm -rf '$tmpdir'" EXIT
	echo -e "\ntmpfs enabled and mounted in $tmpdir" >&2
	tmpout_sync() { sync "$tmpout" 2>/dev/null ||:; }
	export TMPDIR=$tmpdir
fi

rm -f "$statf"
touch "$statf"
( exec -a myponytail tail -f "$statf" & )

benchmark 2>>"$statf"

cachedrop() {
	echo "System cache drop with filesystems sync"
    sync; echo 3 | tee /proc/sys/vm/drop_caches >/dev/null
}

benchmark2() {
	for dir in /bin /usr/bin; do
		test -L $dir && continue
		for cmd in 'cdrop=1 ./strings "$f"' './strings "$f"' \
			       'cdrop=1 busybox strings "$f"' 'busybox strings "$f"' \
			       'cdrop=1 strings "$f"' 'strings "$f"'
		do
		{
			echo
			echo "$cmd" | grep -qw 'cdrop=1' && cachedrop;
			echo "For every file '\$f' in '$dir' eval '$cmd'";
		} >&2
			file_list="$(find $dir/ -type f | grep -v ' ')"
		{
			time for f in $file_list; do eval "$cmd"; done | dd of=/dev/null
		} 2>&1 | grep -E "real|bytes" >&2
		done
	done
}

benchmark2 2>>"$statf"

echo 2>>"$statf"
sync "$statf"
kill $(pgrep -f myponytail)
echo -e "\nstats file: $statf\n"
more "$statf"
 
*******************************************************************************/
 
#define USE_MALLOC 1

#include <stdio.h>
#include <string.h>
#if USE_MALLOC
#include <malloc.h>
#endif
#include <stdbool.h>
#include <unistd.h>
#include <fcntl.h>

#define isPrintable(c) ((c) == 0x09 || ((c) > 0x1f && (c) < 0x7f))

#define print_text(p,b,c) if(p-b >= 4) { *p++ = (c); *p++ = 0; printf("%s",b); }

#define BUFSIZE 4096 //RAF: memory page typical size

int main(int argc, char * argv [])
{
#if USE_MALLOC
    unsigned char *p, *ch = 0, *buffer, *stdout_buffer, *file_buffer;
#else
    unsigned char buffer[4096], stdout_buffer[4096], file_buffer[4096];
    unsigned char *p = buffer, *ch = 0;
#endif
    int n, nr = 0, fd = -1;
    bool ltpr = 0, pr = 0;

    if(argv[1] && !argv[1][0])
    {
		fprintf(stderr, "Usage: %s file\n", argv[0]);
		return 1;
    } 
    //RAF: nice to have '-' but it is not compatible with binutils strings
    else if(argc < 2 /*|| (argv[1] && argv[1][0] == '-')*/)
    {
		fd = fileno(stdin);
    }
    
    if(fd == -1)
    {
		fd = open(argv[1], O_RDONLY);
		if(fd < 0) {
			fprintf(stderr, "Could not open %s\n", argv[1]);
			return 1;
		}
    }

#if USE_MALLOC
    buffer = malloc(BUFSIZE*3);
    p = buffer;
    if(!p) {
	    fprintf(stderr, "Could not malloc %d x 3\n", 4096);
	    close(fd);
		return 1;
    }
	stdout_buffer = &p[BUFSIZE];
	file_buffer = &p[BUFSIZE*2];
#endif
    
    setvbuf(stdout, (char *)stdout_buffer, _IOFBF, BUFSIZE);
    
    while(1)
    {
		ch = NULL;
		n = read(fd, file_buffer, BUFSIZE);
		if(n <= 0)
			break;
		ch = file_buffer;
				
		while(n-- > 0)
		{
			nr = p - buffer;
			pr = isPrintable(*ch);
			
		    if(pr && (nr < BUFSIZE-7))
		    {
		    	*p++ = *ch;
		    }
		    else
		    {
			    if(ltpr || nr > 3) {
					ltpr = pr;
					*p++ = pr ? *ch : '\n';
					*p++ = 0;
					printf("%s", buffer);
			    }
		        p = buffer;
		    }
		    ch++;
		}
	}
#if 0 //RAF: this is just for debugging and it can be removed or not as you like
	*p = 0;
	fprintf(stderr, "ltpr: %d, nr: %d, len: %ld, ch: 0x%02x %s, buf: '%s'\n",
		ltpr, nr, p - buffer, ch ? *ch : 0, ch ? "(char)" : "(null)", buffer);
#endif
	if(ltpr || p - buffer > 3) {
		*p = 0;
		printf("%s\n", buffer);
	}

    fflush(stdout);
#if USE_MALLOC
    free(buffer);
#endif
    close(fd);

    return 0;
}
_______________________________________________
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Reply via email to