On Sat, 22 Jul 2023 at 15:40, tito <[email protected]> wrote: > Hi, > > I'm not the maintainer so I can say nothing about integration, > I can just point out things that look strange to me and my limited knowledge. > When I read that this code is faster vs other code as I'm a curious > person I just try to see how much faster it is and why as there > is always something to learn on the busybox mailing list. > If in my little tests it is not faster then I think I'm entitled > to ask questions about it as science results should be reproducible. > > For simple benchmarking maybe reading a big enough file > into memory and feeding it to strings in a few 1000 iterations > should do to avoid bias from hdd/sdd and system load, one shot shows: > > ramtmp="$(mktemp -p /dev/shm/)" > dd if=vmlinux.o of=$ramtmp > echo $ramtmp > /dev/shm/tmp.ll3G2kzKE1 > > 1) coreutils strings > time strings $ramtmp > /dev/null
This is not correct because you are reading a file in tmpfs while the normal operations do not happen in this way for almost all the cases. Sometimes in ramfs, usually not. While it makes perfectly sense that the output will be sent to a tmpfs especially for those devices that the hdd/sdd/flash is particularly slow. After all, the strings output is temporary for its nature and IMHO is piped with grep, usually. > > of course a few more iterations would give statistically better results. The suite I provided with benchmark.sh is the answer because with dropping cache en/disabled check the two most important system states with all the cases that matter in real life, AFAIK. > 2) busybox strings vs new strings: > > for i in $list; do if test -f $i; then ./Desktop/strings $i > out1.txt; > ./Desktop/busybox strings $i > out2.txt; diff -q out1.txt out2.txt; fi; done > Files out1.txt and out2.txt differ Confirmed that exists some differences in output with this: for i in /usr/bin/*; do if test -f $i; then ./strings $i > out1.txt; busybox strings $i > out2.txt; diff -q out1.txt out2.txt || break; fi; done diff -pruN out1.txt out2.txt Lines particularly long, more than 4096 characters are divided into blocks with \n. It is clearly a corner case for which \n should be omitted in printing. Thanks for this test, I did some but I did not catch the 4096 buffer overrun. > I suspect this could be a problem for integration and also size of code > after integration is relevant. It is a corner case that could be addressed. I did not check the size of strings in busybox. However, once confirmed that the size is more important than the speed for busybox - I agree on this - then it can be proposed to binutils (or coreutils) depending on which package is included. I found the binary version for aarch64 on binutils, AFAIR. Best regards, R- _______________________________________________ busybox mailing list [email protected] http://lists.busybox.net/mailman/listinfo/busybox
