Re: Proposal for a new applet: strings

Roberto A. Foglietta Sun, 23 Jul 2023 03:01:56 -0700

On Sun, 23 Jul 2023 at 11:42, tito <[email protected]> wrote:
>
> On Sun, 23 Jul 2023 00:36:09 +0200
> "Roberto A. Foglietta" <[email protected]> wrote:
>
> > On Sat, 22 Jul 2023 at 21:29, tito <[email protected]> wrote:
> > >
> > > On Sat, 22 Jul 2023 19:31:28 +0200
> > > "Roberto A. Foglietta" <[email protected]> wrote:
> > >
> > > > On Sat, 22 Jul 2023 at 15:40, tito <[email protected]> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I'm not the maintainer so I can say nothing about integration,
> > > > > I can just point out things that look strange to me and my limited 
> > > > > knowledge.
> > > > > When I read that this code is faster vs other code as I'm a curious
> > > > > person I just try to see how much faster it is and why as there
> > > > > is always something to learn on the busybox mailing list.
> > > > > If in my little tests it is not faster then I think I'm entitled
> > > > > to ask questions about it as science results should be reproducible.
> > > > >
> > > > > For simple benchmarking maybe reading a big enough file
> > > > > into memory and feeding it to strings in a few 1000 iterations
> > > > > should do to avoid bias from hdd/sdd and system load, one shot shows:
> > > > >
> > > > > ramtmp="$(mktemp -p /dev/shm/)"
> > > > >  dd if=vmlinux.o of=$ramtmp
> > > > > echo $ramtmp
> > > > > /dev/shm/tmp.ll3G2kzKE1
> > > > >
> > > > > 1) coreutils strings
> > > > > time  strings $ramtmp > /dev/null
> > > >
> > > > This is not correct because you are reading a file in tmpfs while the
> > >
> > > Yes, this was exactly the purpose of the test to eliminate all
> > > factors connected to underlying block devices and time
> > > the speed of code of the different implementations.
> > >
> >
> > Which is wrong because you did a hypothesis which is far away from the
> > typical usage and in some cases you can even use it because strings
> > over a 4GB ISO image would not necessarily fit into a tmpfs in every
> > system. Abstract benchmarks can be funny but do not depict/measure the
> > reality as usual. Extending this logic, we can trash the Ohm law
> > because we can reach in the laboratory a near zero temperature!
>
> I see but dropping the caches etc doesn't seem to be a typical use case 
> either.


Dropping the cache is a trick to bring the system in its state after
the boot or as much as possible at that point. It is indispensable for
a confrontation with the normal functioning which has a larger
variance in completion time for each runs.

>
> Using the same optimization flag -O3 the busybox applet in a real life
> system gives close empirical results, which is the results most
> people in their normal life use cases (one shot, no loops running,
> no files in memory, no dropped caches, no giant multi-GB files)
> will see so the performance increase is swallowed by the system
> or by other bottlenecks.
>

This is correct, AFAIK my busybox has been compiled with -02. I have to check.


> I think the size will rather increase as there are a bunch of features
> missing that the original bb implementation already has:
>
> 1) multiple file handling (a must i would dare to say)

Which is not such a problem, after all

for i in "$@"; do simply-strings "$i" | sed -e "s/^/$i:/"; done

the sed will include also the file name in front of the string which
is useful for grepping. However, the single-file limitation brings to
personalize the approach:

for i in "$@"; do simply-strings "$i" | grep -w "word" && break; done; echo $i

For example. However, I admit that you are right about multiple-files
input. Personally, I do not need at all and if I need, I do with a
custom for.


> 2) -a -f -o -n -t command line options
> The options are:
>   -a - --all                Scan the entire file, not just the data section 
> [default]
>   -f --print-file-name      Print the name of the file before each string
>   -n --bytes=[number]       Locate & print any NUL-terminated sequence of at
>                                                least [number] characters 
> (default 4).
>   -t --radix={o,d,x}        Print the location of the string in base 8, 10 or 
> 16
>   -o                        An alias for --radix=o
>

Yes, strings has a lot of options and also busybox have several
options. This is the best critic about proceeding with an integration.
I will check if I can put an optimization into bb strings, just for my
own curiosity.


> 3) output compatible with original gnu strings
>
> > In attachment the new version with the test suite and the benchmark
> > suite in the header. The benchmark suite did not change with respect
> > to the script file I just sent.
> >
> > Best regards, R-
>
> BTW: there still seem to be corner-cases:
> list=`find /usr`
> for i in $list; do if test -f $i; then  ./strings $i > out1.txt; strings $i > 
> out2.txt; diff -q out1.txt out2.txt; fi; done
> Files out1.txt and out2.txt differ
> Files out1.txt and out2.txt differ
> Files out1.txt and out2.txt differ
> Files out1.txt and out2.txt differ
>
> test is still running....

ok, I will do a run. Can you please echo the finenames, instead?

for i in $list; do if test -f $i; then  ./strings $i > out1.txt;
strings $i > out2.txt; diff -q out1.txt out2.txt >/dev/null || echo
$i; fi; done

Thanks, R-
_______________________________________________
busybox mailing list
[email protected]
http://lists.busybox.net/mailman/listinfo/busybox

Re: Proposal for a new applet: strings

Reply via email to