Re: Proposal for a new applet: strings

Roberto A. Foglietta Mon, 24 Jul 2023 01:37:35 -0700

On Sun, 23 Jul 2023 at 16:38, tito <farmat...@tiscali.it> wrote:
>
> On Sun, 23 Jul 2023 16:17:54 +0200
> "Roberto A. Foglietta" <roberto.foglie...@gmail.com> wrote:
>
> > On Sun, 23 Jul 2023 at 13:18, tito <farmat...@tiscali.it> wrote:
> > >
> > > On Sun, 23 Jul 2023 12:00:56 +0200
> > > "Roberto A. Foglietta" <roberto.foglie...@gmail.com> wrote:
> >
> > > > >
> > > > > 1) multiple file handling (a must i would dare to say)
> > > >
> > > > Which is not such a problem, after all
> > > >
> > > > for i in "$@"; do simply-strings "$i" | sed -e "s/^/$i:/"; done
> > > >
> > > > the sed will include also the file name in front of the string which
> > > > is useful for grepping. However, the single-file limitation brings to
> > > > personalize the approach:
> > > >
> > > > for i in "$@"; do simply-strings "$i" | grep -w "word" && break; done; 
> > > > echo $i
> > >
> > > Don't cheat, this change would break other people's scripts.
> >
> > Other people are not anymore into the scene, since the moment that we
> > established that reinventing the wheel is not efficient nor useful.
> >
> >
> > >
> > > > Yes, strings has a lot of options and also busybox have several
> > > > options. This is the best critic about proceeding with an integration.
> > > > I will check if I can put an optimization into bb strings, just for my
> > > > own curiosity.
> > >
> > > This would be far better than reinventing the wheel.
> > >
> >
> > Reinventing the wheel is a good way to understand how the wheel works
> > and improve it. We just concluded that there is no reason to reinvent
> > the wheel completely. However, the simple-strings can be useful when
> > its deployment fixes fulfill a void better than replacing a
> > fundamental system component like busybox which can break future OTA.
>
> Ever thought about compiling a busybox copy with only one applet
> or few applets that need fixes or updates or new features ?
> This was done a lot in the first android roms.
>
> > In particular, it is fine as a service/rescue/recovery image in which
> > the space is limited and the full compatibility with strings or
> > busybox strings is not necessary and for everything else custom
> > scripts can easily compensate.
> >
> > About improving busybox strings and more in general its printf
> > performance, it is about this:
> >
> > setvbuf(stdout, (char *)stdout_buffer, _IOFBF, BUFSIZE);
> >
> > Obviously a large static buffer can impact the footprint but as long
> > as malloc() is used into the busybox - and in its library I remember
> > there were sanitising wrappers for it - then it would not be such a
> > big deal to use a dynamically allocated buffer. The tricky aspect is
> > about the applet forking. A topic that I do not know but I saw an
> > option "no fork" in the config. I did not even start to see the code,
> > therefore I am just wondering about.
>
> Yes busybox code is tricky. This NO_FORK stuff is a black magic
> I really haven't understood yet.


I did not investigate that option nor the code but I have the
sensation that it would be useful in two different cases:

1 - single applet busybox
2 - NOMMU systems for which v/fork is a burden

My speculation is that when I call busybox, it forks on the applet
function which drops everything it does not need and each call is a
full detached process. With NO_FORK, I suppose that everything remains
in memory and as much as possible the kernel keeps it in memory as a
shared object. For example the code of busybox. While for each call,
it duplicates the stack like a function in pthreads does. Therefore,
every buffer defined is duplicated into each stack, by default unless
a special definition messes up this general principle.

About using setvbuf() in busybox:

setvbuf(stdout, (char *)stdout_buffer, _IOFBF, BUFSIZE);

It does not seem a viable solution for every applet. Therefore, I
would insert into strings only and few others. Doing a grep into
busybox code that function has been used in few applets:

$ grep -rn setvbuf . 2>/dev/null | grep \.c:
./miscutils/hexedit.c:263: setvbuf(stdout, xmalloc(sz), _IOFBF, sz);
./coreutils/tee.c:126: setvbuf(stdout, NULL, _IONBF, 0);
./shell/match.c:105: setvbuf(stdout, NULL, _IONBF, 0);
./runit/svlogd.c:597: setvbuf(ld->filecur, NULL, _IOFBF, linelen); ////
./runit/svlogd.c:860: setvbuf(ld->filecur, NULL, _IOFBF, linelen); ////
./runit/svlogd.c:1128: setvbuf(stderr, NULL, _IOFBF, linelen);

The man https://linux.die.net/man/3/setvbuf explains that in busybox
just the single line is buffered except for hexedit. The full
buffering, it might be useful also in dd when stdout is used and
strings. Considering that defining a buffer in a function (applet)
implies increasing the size of the executable, it makes sense using a
malloc (a BB wrapper for it). After all, the malloc() code is included
in busybox and using one more time adds just the ASM which is needed
to handle that function call.

Moreover, in my simple-strings, I have used a 4096 because I suppose
that it is the kernel memory page therefore it is necessarily a
contiguous physical RAM allocation, the biggest one granted without
adding extra code apart from the malloc(). In some systems this
setting could be different but I would not make a request like getconf
to have that value and possibly the busybox does by itself (or might
not). Possibly the kernel memory page size is set by default at
compiling time for the specific target architecture. I bet that almost
all use 4kb.

Hint: it does not make sense to investigate all these details in depth
before starting coding. We can assume that NO_FORK=0 NOMMU=0 and
MEM_PAGE_SIZE=4096 and with these assumptions make some benchmarks on
those systems/architectures for which those hypotheses are fulfilled.
If the benchmark shows sensitive improvement in common usage cases,
then it worths investigate also those cases that are not contemplated
into the initial hypotheses. IMHO, 4Kb would be fine even if the
memory page size is smaller.

I hope this help, R-
_______________________________________________
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: Proposal for a new applet: strings

Reply via email to