On Sat, 22 Jul 2023 at 21:29, tito <[email protected]> wrote:
>
> On Sat, 22 Jul 2023 19:31:28 +0200
> "Roberto A. Foglietta" <[email protected]> wrote:
>
> > On Sat, 22 Jul 2023 at 15:40, tito <[email protected]> wrote:
> >
> > > Hi,
> > >
> > > I'm not the maintainer so I can say nothing about integration,
> > > I can just point out things that look strange to me and my limited
> > > knowledge.
> > > When I read that this code is faster vs other code as I'm a curious
> > > person I just try to see how much faster it is and why as there
> > > is always something to learn on the busybox mailing list.
> > > If in my little tests it is not faster then I think I'm entitled
> > > to ask questions about it as science results should be reproducible.
> > >
> > > For simple benchmarking maybe reading a big enough file
> > > into memory and feeding it to strings in a few 1000 iterations
> > > should do to avoid bias from hdd/sdd and system load, one shot shows:
> > >
> > > ramtmp="$(mktemp -p /dev/shm/)"
> > > dd if=vmlinux.o of=$ramtmp
> > > echo $ramtmp
> > > /dev/shm/tmp.ll3G2kzKE1
> > >
> > > 1) coreutils strings
> > > time strings $ramtmp > /dev/null
> >
> > This is not correct because you are reading a file in tmpfs while the
>
> Yes, this was exactly the purpose of the test to eliminate all
> factors connected to underlying block devices and time
> the speed of code of the different implementations.
>
Which is wrong because you did a hypothesis which is far away from the
typical usage and in some cases you can even use it because strings
over a 4GB ISO image would not necessarily fit into a tmpfs in every
system. Abstract benchmarks can be funny but do not depict/measure the
reality as usual. Extending this logic, we can trash the Ohm law
because we can reach in the laboratory a near zero temperature!
> >
> > Lines particularly long, more than 4096 characters are divided into
> > blocks with \n. It is clearly a corner case for which \n should be
>
> It was 35 corner cases in a handful of files due to a arbitrary hard-coded
> limitation.
> You should maybe run it on `find /`
Which can be solved quite easily with a cast of a bool:
bool pr = isPrintable(*ch);
use this bool instead of the check into the code and change this one
#define print_text(p,b,c) if(p-b >= 4) { *p++ = 0; printf("%s%c",b,c); }
print_text(p, buffer, pr ? *ch : '\n'); // print collected text
in such a way it prints the current char rather than the new line.
After this change the average of 2x speed has been maintained.
> >
> > > I suspect this could be a problem for integration and also size of code
> > > after integration is relevant.
> >
The size can be reduced using two buffers only without losing
performances because three are redundant. Which is the current size of
the strings applet? Just to have an idea because the size of a single
binary with main() et company cannot immediately be compared with a
busybox applet. For sure is a lot more smaller than the strings which
also require a large shared library:
size /usr/bin/strings
text data bss dec hex filename
20943 1472 64 22479 57cf /usr/bin/strings
ldd /usr/bin/strings
linux-vdso.so.1 (0x00007ffde4fbc000)
libbfd-2.38-system.so => /lib/x86_64-linux-gnu/libbfd-2.38-system.so
(0x00007f6d64cf1000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6d64ac9000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f6d64aad000)
/lib64/ld-linux-x86-64.so.2 (0x00007f6d64e8c000)
size /lib/x86_64-linux-gnu/libbfd-2.38-system.so
text data bss dec hex filename
1434786 94000 680 1529466 17567a
/lib/x86_64-linux-gnu/libbfd-2.38-system.so
> > It is a corner case that could be addressed. I did not check the size
> > of strings in busybox. However, once confirmed that the size is more
> > important than the speed for busybox - I agree on this - then it can
> > be proposed to binutils (or coreutils) depending on which package is
> > included. I found the binary version for aarch64 on binutils, AFAIR.
>
> I wonder why should they be wanting to change their stable code for
> a new implementation?
Because It is very easy to check that it works, it is 2x faster on
average and on a fine-tuned system can reach 4x, the size drops
dramatically considering to free the binary from the large shared
library.
In attachment the new version with the test suite and the benchmark
suite in the header. The benchmark suite did not change with respect
to the script file I just sent.
Best regards, R-
/*
* (C) 2023, Roberto A. Foglietta <[email protected]>
* Released under the GPLv2 license terms.
*
* This is a rework of the original source code in public domain which is here:
*
* https://stackoverflow.com/questions/51389969/\
* implementing-my-own-strings-tool-missing-sequences-gnu-strings-finds
*
*** HOW TO COMPILE *************************************************************
gcc -Wall -O3 strings.c -o strings && strip strings
*** HOW TO TEST ****************************************************************
for i in $(ls -1 /usr/bin/); do if test -f $i; then strings $i > out1.txt;
./strings $i > out2.txt; diff -q out1.txt out2.txt || break; fi; done
diff -q out1.txt out2.txt || { echo file: $i; xxdiff out1.txt out2.txt; }
*** PERFORMANCES ***************************************************************
gcc -Wall -O3 strings.orig.c -o strings && strip strings && rm -f [12].txt
time strings /usr/bin/busybox >1.txt
real 0m0.035s
time ./strings /usr/bin/busybox >2.txt
real 0m1.843s
gcc -Wall -O3 strings.c -o strings && strip strings && rm -f [12].txt
time strings /usr/bin/busybox >1.txt
real 0m0.033s
time ./strings /usr/bin/busybox >2.txt
real 0m0.012s
*** FOOTPRINT ******************************************************************
gcc -Wall -O3 strings.c -o strings && strip strings && size strings
size ./strings # USE_MALLOC=0 on amd64 no change in execution time
text data bss dec hex filename
2965 664 48 3677 e5d ./strings
size ./strings # USE_MALLOC=1 on amd64 no change in execution time
text data bss dec hex filename
2993 672 48 3713 e81 ./strings
gcc -Wall -Os strings.c -o strings && strip strings && size strings
size ./strings # USE_MALLOC=0 on amd64 no change in execution time
text data bss dec hex filename
2865 664 48 3577 df9 ./strings
size ./strings # USE_MALLOC=1 on amd64 no change in execution time
text data bss dec hex filename
2929 672 48 3649 e41 ./strings
*** BENCHMARK SUITE ************************************************************
#!/bin/bash
export finput="/boot/grub/unicode.pf2" cdrop=disabled
cachedrop() {
if [ "$cdrop" = "enabled" ]; then
sync; echo 3 | sudo tee /proc/sys/vm/drop_caches >/dev/null
fi
return 0
}
stats() {
local tmpf=$(mktemp -p "${TMPDIR:-/tmp}" -t time.XXXX) n=${2:-100}
local cmd=${1:-$(which busybox) strings $finput} m=50
if [ "$n" != "100" ]; then m=$(( (n+1)/2 )); fi
for i in $(seq 1 $n); do cachedrop; eval time $cmd; done 2>$tmpf
{
echo
echo "$cmd ${3:-}"
sed -ne "s,real\t,min: ,p" $tmpf | sort -n | head -n1
let avg=$(sed -ne "s,real\t0m0.[0]*\([0-9]*\)s,\\1,p" $tmpf | tr '\n' '+')0
printf "avg: 0m0.%03ds\n" $(( (m+avg)/n ))
sed -ne "s,real\t,max: ,p" $tmpf | sort -n | tail -n1
} >&2
rm -f $tmpf
}
benchmark() {
local statf=${1:-2.txt} bbcmd=$(which busybox) fname="$finput"
rm -f $statf; $bbcmd strings $bbcmd >/dev/null # just to fill the cache
cachedrop # and drop it
stats "$bbcmd strings $fname" 100 >/dev/null 2>&1 # then unleash the CPU
rm -f 1.txt; cmd="$bbcmd strings $fname";
{ stats "$cmd" 100 "term";
stats "$cmd" 100 "null" >/dev/null;
stats "$cmd" 100 "file" >1.txt; } 2>>$statf
if [ "$cdrop" != "enabled" ]; then
rm -f 1.txt; cmd="cat $fname | $bbcmd strings";
{ stats "$cmd" 100 "term";
stats "$cmd" 100 "null ">/dev/null;
stats "$cmd" 100 "file ">1.txt; } 2>>$statf
fi
rm -f 1.txt; cmd="./strings $fname";
{ stats "$cmd" 100 "term";
stats "$cmd" 100 "null" >/dev/null;
stats "$cmd" 100 "file" >1.txt; } 2>>$statf
if [ "$cdrop" != "enabled" ]; then
rm -f 1.txt; cmd="cat $fname | ./strings";
{ stats "$cmd" 100 "term";
stats "$cmd" 100 "null" >/dev/null;
stats "$cmd" 100 "file" >1.txt; } 2>>$statf
fi
clear; more $statf; echo -e "\nstats file: $statf\n"
}
benchmark stats.txt
*******************************************************************************/
#define USE_MALLOC 0
#include <stdio.h>
#if USE_MALLOC
#include <malloc.h>
#endif
#include <stdbool.h>
#include <unistd.h>
#include <fcntl.h>
#define isPrintable(c) ((c) == 0x09 || ((c) > 0x1f && (c) < 0x7f))
#define print_text(p,b,c) if(p-b >= 4) { *p++ = 0; printf("%s%c",b,c); }
#define BUFSIZE 4096 //RAF: memory page typical size
int main(int argc, char * argv [])
{
#if USE_MALLOC
char *p, *buffer, *stdout_buffer, *file_buffer;
#else
char buffer[4096], stdout_buffer[4096], file_buffer[4096];
char *p = buffer;
#endif
int n, fd = -1;
if(argv[1] && !argv[1][0])
{
fprintf(stderr, "Usage: %s file\n", argv[0]);
return 1;
}
//RAF: nice to have '-' but it is not compatible with binutils strings
else if(argc < 2 /*|| (argv[1] && argv[1][0] == '-')*/)
{
fd = fileno(stdin);
}
if(fd == -1)
{
fd = open(argv[1], O_RDONLY);
if(fd < 0) {
fprintf(stderr, "Could not open %s\n", argv[1]);
return 1;
}
}
#if USE_MALLOC
buffer = malloc(BUFSIZE*3);
p = buffer;
if(!p) {
fprintf(stderr, "Could not malloc %d x 3\n", 4096);
close(fd);
return 1;
}
stdout_buffer = &p[BUFSIZE];
file_buffer = &p[BUFSIZE*2];
#endif
setvbuf(stdout, stdout_buffer, _IOFBF, BUFSIZE);
while(1)
{
char *ch = file_buffer;
n = read(fd, file_buffer, BUFSIZE);
if(n <= 0)
break;
while(n-- > 0)
{
bool pr = isPrintable(*ch);
if(pr && (p-buffer < BUFSIZE-5))
{
*p++ = *ch;
}
else
{
print_text(p, buffer, pr ? *ch : '\n'); // print collected text
p = buffer;
}
ch++;
}
}
print_text(p, buffer, '\n'); // print the rest, if any
fflush(stdout);
#if USE_MALLOC
free(buffer);
#endif
close(fd);
return 0;
}
_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox