Re: [Toybox] this week in weird coreutils stuff: chmod

2024-05-30 Thread Rob Landley



On 5/29/24 14:20, enh wrote:
> seems to have broken the macOS build?
> ```
> lib/lib.c:953:10: error: conflicting types for 'string_to_mode'
> unsigned string_to_mode(char *modestr, unsigned mode)
>  ^
> ./lib/lib.h:413:10: note: previous declaration is here
> unsigned string_to_mode(char *mode_str, mode_t base);
>  ^
> ```

Oops, missed one. Try commit 3c276ac106a4.

So what _is_ mac using... Sigh:

/Library/Developer/CommandLineTools/SDKs/MacOSX13.1.sdk/usr/include/sys/_types/_mode_t.h:typedef
__darwin_mode_t mode_t;
/Library/Developer/CommandLineTools/SDKs/MacOSX13.1.sdk/usr/include/sys/_types.h:typedef
__uint16_t  __darwin_mode_t;/* [???] Some file attributes */

They typedef it to unsigned short instead of unsigned int. Even though type
promotion will pass an int on the stack for anything smaller than an int, and
use an int register to do the math...

I guess back in 1974 "int" was a 16 bit type, and they stuck with that in the
move to 32 and then 64 bit processors because SUGO times 3 bits each is only
using 12 of those 16 bits, leaving 4 for file types and we've only used 7 of
those 16 combinations for IFDIR and IFBLK and so on (well, 8 on mac but the
header says IFWHT is obsolete), clearly that will never run out...

*shrug* Removing all uses of mode_t and using "unsigned" instead consistently
should work fine. Only "struct stat" should really care, and even then they
could just use the actual primitive type in the struct definition...

(I'm not a fan of data hiding without some _reason_ for it. I used to humor it a
lot more, but now I want to know what/why it's doing.)

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] this week in weird coreutils stuff: chmod

2024-05-29 Thread Rob Landley
On 5/28/24 08:00, enh via Toybox wrote:
> apparently chmod allows something like
> 
>   chmod u+rwX-s,g+rX-ws,o+rX-wt
> 
> as a (far less readable!) synonym for
> 
>   chmod u+rwX,u-s,g+rX,g-ws,o+rX,o-wt
> 
> i'm told that toybox silently accepts the former too, but does not
> interpret it as if it means the latter?

Try commit a2c4a53e155c.

(Needed to zero a variable inside the loop rather than just once at the
beginning. Random cleanups while I was there, plus tests.)

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] strlower() bug

2024-05-29 Thread Rob Landley
On 5/22/24 09:30, enh wrote:
> On Tue, May 14, 2024 at 2:58 PM Rob Landley  wrote:
>> It looks like macos towlower() refuses to return expanding unicode 
>> characters.
>> Possibly to avoid exactly the kind of bug this fixed, in exchange for 
>> corrupting
>> the data.
> 
> yeah, i don't know whether it's on purpose or a bug, but that does
> seem to be the case... i tested with another Latin Extended-B
> character whose uppercase and lowercase forms are both in the same
> block (and thus have the same utf8 encoding length), and macOS
> towlower() does work for that.
> 
> hmm, actually maybe it's just that their Unicode data is out of date?
> it looks like they don't know about Latin Extended-C at all? a code
> point like U+2c62 that gets _smaller_ (because it's in the IPA
> Extensions block) doesn't work either.
> 
> i did try looking in FreeBSD, but i've never understood how this stuff
> works there.

FreeBSD questions go to Ed Maste  who is theoretically
subscribed here but keeps getting unsubscribed by gmail bounces.

> i'm guessing from the fact i've never found them that the
> implementations are all generated at build time, subtly enough that my
> attempts to grep for the generators fail.
> 
> hmm... looking at Apple's online FreeBSD code, it looks like they have
> very different (presumably older) FreeBSD code
> [https://opensource.apple.com/source/Libc/Libc-320.1.3/locale/FreeBSD/tolower.c.auto.html],
> and the footer of the file that reads implies that they're using data
> from Unicode 3.2 (released in 2002, which would make sense given the
> 2002 BSD copyright date in the tolower.c source):

Sigh, can't they just ship machine consumable bitmaps or something? I can have
my test plumbing pull "standards" files, ala:

https://github.com/landley/toybox/blob/master/mkroot/packages/tests

But an organization shipping a PDF or 9 interlocking JSON files with a turing
complete stylesheet doesn't help much.
> so, yeah, i don't think there was anything clever or mysterious going
> on here --- macOS is just using Unicode data from 22 years ago. (which
> is an amusing real-world example of why i keep saying "you probably
> don't want to get into the business of redistributing Unicode data; it
> changes every year" :-) )

A youtuber named Ryan McBeth is fond of explaining the difference between a
"problem" and a "dilemma". A problem has an obvious solution, which may be
painful or expensive but there's not a lot of disagreement on what success looks
like. A dilemma has multiple ways to address it, each of which has something
uniquely wrong with it. Problems don't lead to indecision, dilemmas do (and thus
accumulate).

In this case, the dilemma is "trusting libc to get it wrong differently in each
new environment" vs "taking a large expense onboard with borderline xkcd
violation". (If there is an xkcd strip explaining why not to do something, you
probably shouldn't do it. In this case https://xkcd.com/927/ )

Which is _sad_ because there's only a dozen ispunct() variants that read a bit
out of a bitmap (and haven't significantly changed since K: neither isblank()
nor isascii() is worth the wrapper), plus a toupper/tolower pair that map
integers with "no change" being the common case. Plus unicode has wcwidth().
Yes, it's over a (sparse!) table with space for a million entries, but CSV
encoding all that data in human+machine readable ASCII should gzip down to what,
500k?

Let's see, the bits seem to be alpha, cntrl, digit, punct, and space, and then
width (mostly 0, 1, or 2 but we've talked about exceptions), and two translation
codepoints for toupper and tolower.

You can easily derive isalnum() and isxdigit(), and isascii() and isblank() are
trivial according to the man page. If the table has upper and lower mappings
(I.E. what character this turns into, zero if it doesn't) then you don't need
isupper() or islower() bits unless there's cases where "this isn't upper case
but can be converted to lower case" (which aren't covered by having BOTH
toupper() and tolower() mappings for the same character).

I'm honestly unclear on what "isgraph" does, "any printable character except
space"... if isprint() means "not width 0" then that's just adding && !isspace()
so doesn't need to be in the table.

So code, alpha, cntrl, digit, punct, space, width, upper, lower. Something like:

0,0,0,0,0,0,0,0,0
13,0,1,0,0,1,0,0,0
32,0,0,0,0,1,1,0,0
57,0,0,1,0,0,1,0,0
58,0,0,0,1,0,1,0,0
65,1,0,0,0,0,1,0,97

No, that doesn't cover weird stuff like the right-to-left gearshift or the
excluded mapping ranges or even the low ascii characters having special effects
like newline and tab, but those aren't really "characters" are they? Special
case the special cases, don't try to represent them 

Re: [Toybox] microcom.c discarding data due to TCSAFLUSH

2024-05-23 Thread Rob Landley
On 5/20/24 09:42, Yi-Yo Chiang via Toybox wrote:
> Is there any particular reason to use TCSAFLUSH here?

Partly because it's what strace said busybox and minicom were doing, and partly
because I've had serial hardware that produced initial static on more than one
occasion.

In this case, it looks like Elliott also put it in his initial contribution
(commit 12fcf08b5c96).

> If not, can we change to TCSADRAIN or TCSANOW. I don't think there is good
> reason to _discard received data_ just to set the terminal mode...? Is there
> really a real world case that the device termios is so dirty that all data, 
> from
> before setting raw mode, must be discarded?

I've seen multiple instances where there was initial noise from the port going
live before the speed stabilized, or static from a physical connection plugging
in or powering up, or truncated bootloader messages that filled up the input
buffer then abruptly cut off.

> I also tried to modify the microcom code to skip tcsetattr() if the device
> termios is already equal to the mode we are setting it.
> `if (old_termios != new_termios) tcsetattr(new_termios, TCSAFLUSH)`
> However this doesn't work because microcom always tries to set the device 
> baud.

Hmmm, you're right, it shouldn't mess with that unless we specify -s. I could
also make TCSAFLUSH only happen when we do -s (because otherwise it's an
existing connection and we're not messing with it, but I still need to make sure
it's in raw mode)...

Note: FLAG(s)*TCSAFLUSH becomes 0 (TCSANOW) in the absence of -s.

> For example a pty device might be configured to use buad 38400,

Why set the baud at all on a pty? A pseudo-terminal doesn't have a baud rate,
leave it alone. (You can also inherit a serial port that was set up by the
bootloader and should again just be left alone...)

> but microcom
> would want it to be 115200, thus flushing it's data. but pty doesn't really 
> care
> about the baud most of the time AFAIK, so flushing data in this case just 
> seems
> disruptive to the user experience.

Setting baud rate and flushing are two different switches in the interface, but
in this case flushing only when setting the baud rate seems a good use of the
existing controls.

Try commit 2043855a4bd5

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] netcat: clarify documentation.

2024-05-22 Thread Rob Landley
On 5/20/24 07:06, enh via Toybox wrote:
> "Collate" means "sort", but -O is like -o other than buffering.

It means "group". (The dictionary says "gather or arrange in the proper
sequence".) I can see the confusion, but the collate button on the copier in
high school stapled pages together (the pages came out the same order either
way, the question was should the groups be attached). I also had a data entry
work-study job in college where I had to "collate" reports (basically doing the
same thing by hand, except using transparent file folders and this little
plastic strip that slid along the edge to hold the pages in).

We just had a thread about "buffering", and I find that _less_ illuminating in
context.

Sigh, "grouped", "streamed", "together", "terse", "what I thought it was doing
until I actually compared the output side by side", "showing the actual data
instead of the transaction boundaries that survived the nagle algorithm",
"assembled", "congregated", "collected", "packaged", "declutered", 
"thesaurus"...

How does "packed" sound?

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] xputs: Do flush

2024-05-22 Thread Rob Landley
On 5/20/24 08:32, Yi-Yo Chiang wrote:
> Thanks! Adding TOYFLAG_NOBUF worked.
> 
> I feel the same way about "manual flushing of the output buffer is a terrible
> interface". I asked myself the question "Why am I manually flushing so much?
> There must be a better way..." multiple times when I wrote the other patch 
> that
> does s/xprintf/dprintf/, s/xputs/xputsn/

It's an annoying design quandry.

> > Your other patch changes a bunch of xprintf() to dprintf() which is even
> _more_
> > fun because dprintf() writes directly to the file descriptor (bypassing 
> the
> > buffer in the libc global FILE * instance "stdio"), which means in the 
> absence
> > of manual flushing the dprintf() data will go out first and then the 
> stdio
> data
> > will go out sometime later, in the wrong order. Mixing the two output 
> formats
> > tends to invert the order that way, unless you disable output buffering.
> 
> Which is the reason I replaced those all with the "flush" functions (xputsn) 
> or
> direct fd-write functions (dprintf), so that their order won't shuffle.
> Anyway the problem is moot now that we have TOYFLAG_NOBUF.

Eh, not moot. Shifted. Currently there's one command using TOYFLAG_NOBUF, and a
lot of recent buffering fixes:

ea119151ccc5
59b041d14aec
afeed2d46a9a
a57e42a386b0
ca6bde9e1c43

I should probably audit all the commands and figure out which buffering type to
use for each. (grep currently finds manual fflush() in hexedit, login, watch,
and ps).

But not today...

> > But that hasn't been popular, and it's a pain to implement in userspace 
> > (because
> > we don't have access to mulitple cheap timers like the kernel does, we need 
> > to
> > take a signal and there's both a limited number of signals).
> 
> do you run on anything that doesn't have real-time signals? i was
> going to say that -- since toybox is a closed world -- you could just
> use SIGUSR2, but i see that dhcp is already using that! but if you can
> assume real-time signals, there are plenty of them...

Within toybox I could probably come up with something, true. Although fflush()
locking is still a bit problematic if I'm not depending on thread
infrastructure. (Either I don't use FILE * and do it myself, or I require libc
to be thread aware.)

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] xputs: Do flush

2024-05-22 Thread Rob Landley
On 5/20/24 07:36, enh wrote:
>> Adding flushing to xputs() is an Elliott question is because he's the one who
>> can presumably keep stdio buffer semantics in his head. They make zero sense 
>> to
>> me. I added a flush to xputsn() because in line buffering mode it was the
>> "dangerous" one that had no newline thus no flush, but then when we go to 
>> block
>> buffering mode xputs() needs a flush just like xputsn() would, and MAYBE it's
>> good to have the flush because in line buffer mode it would be a NOP? Except 
>> the
>> command selected block buffering mode precisely BECAUSE it didn't want to 
>> flush
>> each line, so why should xputs() do it when the command asked not to? And if
>> xputs() is doing it, why is xprintf() _not_ doing it? And if xprintf() _is_
>> doing it, then we're back to basically not buffering the output...
> 
> this to me was exactly why it should be "everything flushes" or
> "nothing flushes". not "some subset of calls for some subset of
> inputs", because no-one can keep all the special cases in their heads.
> and "everything flushes" is problematic from a performance
> perspective, so "nothing flushes" wins by default. (but, yes, when we
> have our own kernel, have a time-based component to buffering layer's
> flushing is an interesting idea :-) )

Eh, now Yi-Yo's pointed me back at timer_create() and reminded me of realtime
signals, it seems like the plumbing is there to make FILE * output use nagle.

The problem is a userspace wrapper trying to fflush() from signal context
assumes everything's written in an async-safe way (and of course that everyone
else will SA_RESTART when interrupted), which either means spray it down with
thread locking or use cmpxchg() within the flush() implementation, and either
way involves me trusting libc in a way I currently don't. (And I can't do it
"right" myself due to FILE * internals being opaque, I have to wrap an unknown
implementation...)

But it sounds quite feasible for _libc_ to do setvbuf(_IONAG) these days. :)

(Sigh, or threads with signalfd(). Grrr.)

>> I like systematic solutions that mean the problem never comes up again. 
>> Elliott
>> has been advocating the whack-a-mole "fix them as we find them" approach here
>> which I am not comfortable with. I've been leaning towards adding a
>> TOYFLAG_NOBUF so we can select a third buffering type, and go back to "do not
>> buffer output at all, flush after every ANSI write function" for at least 
>> some
>> commands I'd like to be able to reason about. Especially for commands like 
>> this
>> where output performance really doesn't seem like it should be an issue.
> 
> +1 --- an inherently interactive program like this seems reasonable to
> be unbuffered. (except for file transfer?)

Isn't file transfer sending 4k blocks?

Buffering gets really weird with this kind of program anyway: when you're
sending data across a serial port that's breaking it up into individually
transmitted bytes, and depending on what your 16550a-or-similar threshold is set
to the recipient's probably getting notified of each group of 8 bytes. (And yes,
the hardware uses SOMETHING LIKE NAGLE internally to enforce the input and
output notification thresholds.)

Then you layer ppp over it, which breaks your 4k into something like 1.5k
chunks, then does nagle on that trying to fill out the last packet, and that's
assuming there were no pipes in there which do their own re-collating on the
data. And then of course files, once there's filesystem involvement... but of
course the VFS is in there before that marshalling data into and out of page
cache...

The point of the output buffer is to deal with chunks of data "big enough" to
amortize the transaction overhead. Zerocopy of the data has always been somewhat
aspirational, and handing off buffers between page table contexts is often more
expensive than copying it. (Or not! It changes between hardware generations and
I didn't even PRETEND to be current on how the "mitigations for cache
speculation side channel attacks" differ between different kernel versions
running on different arm processors...

There comes a point where "locality within process good, launch largeish buffer
out into the operating system, wave bye-bye as it goes off into the machinery"
is the best I can do. Although the definition of largeish still has the dregs of
moore's law clinging to it with the recent 4k->256k push. (xmodem had 128 byte
packets, I suppose it's roughly the same jump...)

>> https://lists.gnu.org/archive/html/coreutils/2024-03/msg00016.html
> 
> (fwiw, i think that was just some internet rando asking for it, no?
> and they didn't actually implement it?)

Padraig's reply was "this does seem like useful functionality" and a pointer to
the libc people, and then there were over a dozen additional replies in the
thread, so I wouldn't call it a clear no...

> do you run on anything that doesn't have real-time signals? i was
> going to say that -- since toybox is a closed world -- you 

Re: [Toybox] [PATCH] xputs: Do flush

2024-05-19 Thread Rob Landley
On 5/18/24 21:53, Yi-Yo Chiang wrote:
> What I wanted to address with this patch are:
> 
> 1. Fix this line of
> xputs() https://github.com/landley/toybox/blob/master/toys/net/microcom.c#L113
> The prompt text is not flushed immediately, so it is not shown to the user 
> until
> the escape char is entered (which defeats the purpose of the prompt, that is 
> to

I agree you've identified two problems (unflushed prompt, comment not matching
code) that both need to be fixed. I'm just unhappy with the solutions, and am
concerned about a larger design problem.

I implemented TOYFLAG_NOBUF and applied it to this command. The result compiles
but I'm not near serial hardware at the moment, does it fix the problem for you?

Trying to fix it via micromanagement (adding more flushing and switching some
but not all output contexts in the same command between FILE * and file
descriptor) makes my head hurt...

Adding flushing to xputs() is an Elliott question is because he's the one who
can presumably keep stdio buffer semantics in his head. They make zero sense to
me. I added a flush to xputsn() because in line buffering mode it was the
"dangerous" one that had no newline thus no flush, but then when we go to block
buffering mode xputs() needs a flush just like xputsn() would, and MAYBE it's
good to have the flush because in line buffer mode it would be a NOP? Except the
command selected block buffering mode precisely BECAUSE it didn't want to flush
each line, so why should xputs() do it when the command asked not to? And if
xputs() is doing it, why is xprintf() _not_ doing it? And if xprintf() _is_
doing it, then we're back to basically not buffering the output...

> tell the user what the escape char is) and stdout is flushed by handle_esc.
> To fix this we either make xputs() flush automatically, or we just add a 
> single
> line of manual flush() after xputs() in microcom.c.
> Either is fine with me.

When I searched for the first xputs in microcom I got:

  xputsn("\r\n[b]reak, [p]aste file, [q]uit: ");
  if (read(0, , 1)<1 || input == CTRL('D') || input == 'q') {

Which is a separate function (the n version is no newline, it does not add the
newline the way libc puts() traditionally does), with its own flushing
semantics: xputsn() doesn't call xputs(), and neither calls or is called by
xprintf(). "Some functions flush, some functions don't" is a bit of a design
sharp edge.

The larger problem is manual flushing of the output buffer is a terrible
interface, and leads to missing error checking without which a command won't
reliably exit when its output terminal closes because the whole SIGPIPE thing
was its own can of worms that even bionic used to manually mess with. Which is
why I originally made toybox not ever do that (systemic fix) but I got
complaints about performance.

Your other patch changes a bunch of xprintf() to dprintf() which is even _more_
fun because dprintf() writes directly to the file descriptor (bypassing the
buffer in the libc global FILE * instance "stdio"), which means in the absence
of manual flushing the dprintf() data will go out first and then the stdio data
will go out sometime later, in the wrong order. Mixing the two output formats
tends to invert the order that way, unless you disable output buffering.

I like systematic solutions that mean the problem never comes up again. Elliott
has been advocating the whack-a-mole "fix them as we find them" approach here
which I am not comfortable with. I've been leaning towards adding a
TOYFLAG_NOBUF so we can select a third buffering type, and go back to "do not
buffer output at all, flush after every ANSI write function" for at least some
commands I'd like to be able to reason about. Especially for commands like this
where output performance really doesn't seem like it should be an issue.

And OTHER implementations can't consistently get this right, which is why 
whether:

  for i in {1..100}; do echo -n .; sleep .1; done | less

Produces any output before 10 seconds have elapsed is potluck, and varies from
release to release of the same distro.

Oh, and the gnu/crazies just came up with a fourth category of write as a
gnu/extension: flush after NUL byte.

https://lists.gnu.org/archive/html/coreutils/2024-03/msg00016.html

It's very gnu to fix "this is too complicated to be reliable" by adding MORE
complication. Note how the problem WE hit here was 1) we didn't ask for LINEBUF
mode, 2) \r doesn't count as a line for buffer flushing purposes anyway, 3) the
new feature making it trigger on NUL instead _still_ wouldn't make \r count as a
line for buffer flushing purposes.

My suggestion for a "proper fix" to the problem _category_ of small writes being
too expensive was to have either libc or the kernel use nagle's algorithm for
writes from userspace, like it does for network connections. (There was a fix to
this category of issue decades ago, it just never got applied HERE.)

But that hasn't been popular, and it's a pain to implement in 

Re: [Toybox] [PATCH] xputs: Do flush

2024-05-18 Thread Rob Landley
On 5/16/24 06:46, Yi-Yo Chiang via Toybox wrote:
> The comment string claims xputs() to write, flush and check error.
> However the 'flush' operation is actually missing due to 3e0e8c6
> changing the default buffering mode from 'line' to 'block'.

That's sort of an Elliott question?

Originally, xprintf() and friends all flushed (which is necessary to detect
output errors and xexit() if so), but Elliott complained that was too slow, so
the flushes got removed, and then we changed the default stdout buffering type,
and...

Alas, it was a whole multi-year thing. Elliott has volunteered to put manual
flushes everywhere it's a problem. I've seriously thought about going
exclusively to file descriptor output (dprintf() is in posix now) and leaving
FILE * for input only.

Personally, I honestly believe the _proper_ fix is to upgrade the kernel to use
vdso to implement nagle's algorithm on file descriptor 1:

https://landley.net/notes-2024.html#28-04-2024

But I'm not holding my breath.

Rob

P.S. I should post some subset of
https://landley.net/bin/mkroot/latest/linux-patches/ to linux-kernel again. So
they can be ignored again.
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] netcat -f bug

2024-05-15 Thread Rob Landley
On 5/11/24 02:11, Yi-Yo Chiang wrote:
> On Sat, May 11, 2024 at 1:30 AM Rob Landley  <mailto:r...@landley.net>> wrote:
> 
> What's your use case triggering this patch? Because without that, I go 
> off on
> various design tangents, as seen below:
> 
> I just wanted some tool to communicate with a pty or socket node on android.
> Wanted a program to be able to send/recv towards a duplex data stream. (more
> precisely I want a command that does exactly what pollinate() does)
> Since socat nor minicom is available on Android, I'm just using `stty raw 
> -echo
> && nc -f` to "talk" to my pty.
> 
> Why didn't I use <> redirector? Because I wasn't aware of that feature before
> reading this mail...
> Let me fiddle with it a bit:
> 
> cat <>/dev/pts/0
>> Shows the pts output, but my input doesn't get passed back

Sorry for sitting on this, my confusion here is I don't know what /dev/pts/0
means in your test, and the pts man page isn't illuminating. It doesn't seem to
be special, it just seems to be the first one allocated? (So who allocated it on
android?)

According to "tty" in a random command line tab that one's using /dev/pts/17,
and ps ax | grep pts/0 says it's PID 14597 a random bash instance, so I don't
think the test lines up on a debian+xfce laptop.

What is your test trying to _do_? (What process are you talking to?)

> yeah like you said it should had fall through and be like -l. 
> However digging the git history the fall through line got removed
> here 
> https://github.com/landley/toybox/commit/67bf48c1cb3ed55249c27e6f02f5c938b20e027d
> which is unintentional I think?

Yeah, lack of automated regression testing for this, which is why I want to
understand and fix the test...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] strlower() bug

2024-05-14 Thread Rob Landley


On 5/14/24 12:12, enh wrote:
> On Tue, May 14, 2024 at 1:04 PM Rob Landley  wrote:
>>
>> On 5/14/24 07:10, enh wrote:
>> > macOS tests seem to be broken since this commit?
>> >
>> > FAIL: find strlower edge case
>> > echo -ne '' | touch aⱥ; find . -iname aȺ
>> > --- expected 2024-05-10 17:32:56.0 +
>> > +++ actual 2024-05-10 17:32:56.0 +
>> > @@ -1 +0,0 @@
>> > -./aⱥ
>>
>> Sigh. Apple's handling of utf8/unicode continues to be... "a challenge".
>>
>> When I run "make test_find" standalone, it gives me:
>>
>> scripts/runtest.sh: line 219: syntax error near unexpected token `;'
>> scripts/runtest.sh: line 219: `  R) LEN=0; B=1; ;&'
>>
>> Because bash 3.2 from 2007 doesn't understand ;&
> 
> yeah, nor does mksh. it hasn't caused me any problems though; i've
> been ignoring it for years now.
> 
>> And THEN it goes:
>>
>> touch: out of range or illegal time specification: 
>> -MM-DDThh:mm:SS[.frac][tz]
>> touch: out of range or illegal time specification: 
>> -MM-DDThh:mm:SS[.frac][tz]
>> FAIL: find newerat
>> echo -ne '' | find dir -type f -newerat @12345
>> --- expected2024-05-14 11:16:40.0 -0500
>> +++ actual  2024-05-14 11:16:40.0 -0500
>> @@ -1 +0,0 @@
>> -dir/two
>>
>> Which is a different error that DOESN'T happen with the global tests, because
>> those are using toybox touch rather than homebrew's $TOUCH. But it works on
>> debian. Let's see:
>>
>> $ touch --version
>> touch: illegal option -- -
>> usage: touch [-A [-][[hh]mm]SS] [-achm] [-r file] [-t [[CC]YY]MMDDhhmm[.SS]]
>>[-d -MM-DDThh:mm:SS[.frac][tz]] file ...
>>
>> Thank you, gnu project. I'm gonna assume this is _also_ from 2007. (I made
>> scripts/prereq/build.sh for a REASON...)
> 
> no, i think this is a BSD touch.
> 
> yeah, that looks very like the FreeBSD touch's usage:
> 
> static void
> usage(const char *myname)
> {
> fprintf(stderr, "usage: %s [-A [-][[hh]mm]SS] [-achm] [-r file] "
> "[-t [[CC]YY]MMDDhhmm[.SS]]\n"
> "   [-d -MM-DDThh:mm:SS[.frac][tz]] "
> "file ...\n", myname);
> exit(1);
> }
> 
> 
>> Then when I run "make clean macos_defconfig tests" I get:
>>
>> Undefined symbols for architecture arm64:
>>   "_iconv", referenced from:
>>   _do_iconv in iconv.o
>>  (maybe you meant: _iconv_main)
>>   "_iconv_open", referenced from:
>>   _iconv_main in iconv.o
>> ld: symbol(s) not found for architecture arm64
>>
>> Because the Makefile has:
>>
>> tests: ASAN=1
>> tests: toybox
>> scripts/test.sh
>>
>> And ASAN apparently breaks on homebrew's toolchain but not debian's 
>> toolchain.
>> Why does it break there but not on Linux...
>>
>> probe cc -Wall -Wundef -Werror=implicit-function-declaration
>> -Wno-char-subscripts -Wno-pointer-sign -funsigned-char
>> -Wno-deprecated-declarations -Wno-string-plus-int 
>> -Wno-invalid-source-encoding
>> -fsanitize=address -O1 -g -fno-omit-frame-pointer -fno-optimize-sibling-calls
>> -xc -o /dev/null -
>> error: cannot parse the debug map for '/dev/null': The file was not 
>> recognized
>> as a valid object file
>> clang: error: dsymutil command failed with exit code 1 (use -v to see 
>> invocation)
>>
>> Because it tries to read back the -o output we discarded, and fails when it
>> can't do so, thus all library probes fail and it tries to build with no
>> libraries. But only when ASAN is enabled, because ASAN uses -o as INPUT. 
>> Bravo.
>>
>> None of this is the actual unicode failure, this is just ambient macos...

FAIL: find strlower edge case
echo -ne '' | touch aⱥ; find . -iname aȺ
--- expected2024-05-14 13:32:19.0 -0500
+++ actual  2024-05-14 13:32:19.0 -0500
@@ -1 +0,0 @@
-./aⱥ
make: *** [tests] Error 1
cfarm104 (homebrew):toybox landley$ ls generated/testdir/testdir/
a?
$ LC_ALL=en_US.UTF-8 ls generated/testdir/testdir
a?
$ generated/testdir/ls generated/testdir/testdir
a\342\261\245\342\261\245\342\261\245\342\261\245\342\261\245\342\261\245\342\261\245\342\261\245\342\261\245
$ echo -./aⱥ
-./aⱥ
$ generated/testdir/ls -N generated/testdir/testdir
aⱥ
cfarm104 (homebrew):toybox landley$ generated/testdir/ls -N
g

Re: [Toybox] strlower() bug

2024-05-14 Thread Rob Landley
On 5/14/24 07:10, enh wrote:
> macOS tests seem to be broken since this commit?
> 
> FAIL: find strlower edge case
> echo -ne '' | touch aⱥ; find . -iname aȺ
> --- expected 2024-05-10 17:32:56.0 +
> +++ actual 2024-05-10 17:32:56.0 +
> @@ -1 +0,0 @@
> -./aⱥ

Sigh. Apple's handling of utf8/unicode continues to be... "a challenge".

When I run "make test_find" standalone, it gives me:

scripts/runtest.sh: line 219: syntax error near unexpected token `;'
scripts/runtest.sh: line 219: `  R) LEN=0; B=1; ;&'

Because bash 3.2 from 2007 doesn't understand ;&

And THEN it goes:

touch: out of range or illegal time specification: 
-MM-DDThh:mm:SS[.frac][tz]
touch: out of range or illegal time specification: 
-MM-DDThh:mm:SS[.frac][tz]
FAIL: find newerat
echo -ne '' | find dir -type f -newerat @12345
--- expected2024-05-14 11:16:40.0 -0500
+++ actual  2024-05-14 11:16:40.0 -0500
@@ -1 +0,0 @@
-dir/two

Which is a different error that DOESN'T happen with the global tests, because
those are using toybox touch rather than homebrew's $TOUCH. But it works on
debian. Let's see:

$ touch --version
touch: illegal option -- -
usage: touch [-A [-][[hh]mm]SS] [-achm] [-r file] [-t [[CC]YY]MMDDhhmm[.SS]]
   [-d -MM-DDThh:mm:SS[.frac][tz]] file ...

Thank you, gnu project. I'm gonna assume this is _also_ from 2007. (I made
scripts/prereq/build.sh for a REASON...)

Then when I run "make clean macos_defconfig tests" I get:

Undefined symbols for architecture arm64:
  "_iconv", referenced from:
  _do_iconv in iconv.o
 (maybe you meant: _iconv_main)
  "_iconv_open", referenced from:
  _iconv_main in iconv.o
ld: symbol(s) not found for architecture arm64

Because the Makefile has:

tests: ASAN=1
tests: toybox
scripts/test.sh

And ASAN apparently breaks on homebrew's toolchain but not debian's toolchain.
Why does it break there but not on Linux...

probe cc -Wall -Wundef -Werror=implicit-function-declaration
-Wno-char-subscripts -Wno-pointer-sign -funsigned-char
-Wno-deprecated-declarations -Wno-string-plus-int -Wno-invalid-source-encoding
-fsanitize=address -O1 -g -fno-omit-frame-pointer -fno-optimize-sibling-calls
-xc -o /dev/null -
error: cannot parse the debug map for '/dev/null': The file was not recognized
as a valid object file
clang: error: dsymutil command failed with exit code 1 (use -v to see 
invocation)

Because it tries to read back the -o output we discarded, and fails when it
can't do so, thus all library probes fail and it tries to build with no
libraries. But only when ASAN is enabled, because ASAN uses -o as INPUT. Bravo.

None of this is the actual unicode failure, this is just ambient macos...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


[Toybox] I'm aware landley.net is saying "site.not found".

2024-05-13 Thread Rob Landley
That dreamhost server migration they did? (The recent "only 2 years old" version
thread?) Does not seem to have correctly updated the DNS record. Of the domain
they manage for me.

Dreamhost continues to provide nine fives of uptime. I've pinged support
already, they'll probably get back to me in the morning.

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] today in "shut up, gnu!"

2024-05-12 Thread Rob Landley
On 4/12/24 13:24, enh via Toybox wrote:
> ~/aosp-main-with-phones$ find external/ -name NOTICE -type l -maxdepth 2
> find: warning: you have specified the global option -maxdepth after
> the argument -name, but global options are not positional, i.e.,
> -maxdepth affects tests specified before it as well as those specified
> after it.  Please specify global options before other arguments.

Looking back at this (ok, closing tabs), I think I implemented this the same as
any other option, so you can "-type l -o -maxdepth 2" and friends. The thing is,
when maxdepth triggers it returns without recursing on the path being evaluated,
so you'd have to "-type d -o maxdepth 2" for the difference to matter.
(Recursing into lower entries but not triggering on them.)

But it's not "global" in any magic way. It's just... another option? Which
doesn't seem WRONG... And of course
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/find.html hasn't got
maxdepth.

Meanwhile, busybox has:

//config:config FEATURE_FIND_MAXDEPTH
//config:   bool "Enable -mindepth N and -maxdepth N"
//config:   default y
//config:   depends on FIND

And:

#define INIT_G() do { \
setup_common_bufsiz(); \
BUILD_BUG_ON(sizeof(G) > COMMON_BUFSIZE); \
/* we have to zero it out because of NOEXEC */ \
memset(, 0, sizeof(G)); \
IF_FEATURE_FIND_MAXDEPTH(G.minmaxdepth[1] = INT_MAX;) \
IF_FEATURE_FIND_EXEC_PLUS(G.max_argv_len = bb_arg_max() - 2048;) \
G.need_print = 1; \
G.recurse_flags = ACTION_RECURSE; \
} while (0)

And I miss the days when I worked on that project and it was SIMPLE. I liked
simple. That's what attracted me to it in the first place...

https://git.busybox.net/busybox/commit/?id=053c12e0de30

Yeah, I'm not even trying to understand that right now. I'll take my 730 lines
over their 1750 lines any day, I don't CARE who has the smaller binary size
after stripping specific ELF table entries...

Anyway, I should come up with a test for maxdepth acting as a normal option vs
acting as a global option...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] nproc(1)

2024-05-11 Thread Rob Landley
Relevant blog entry is https://landley.net/notes-2022.html#26-07-2022

> Meanwhile, I found out that musl has a bug! The nproc command has two
> modes, the default shows available processors (as modified by taskset),
> and nproc --all shows installed processors (whether or not the current process
> can schedule on them). One codepath is _SC_NPROCESSORS_CONF and the other
> is _SC_NPROCESSORS_ONLN. Except musl does ONLN for both, it hasn't got the
> second codepath, which according to strace is checking /sys/devices/system/cpu
> in glibc, and the bionic source has a comment saying that /proc/cpuinfo
> works fine on x86 but arm is broken because arm filters out the
> taskset-unavailable processors from that, so you have to look at the sysfs
> one to work around the arm bug.

And then me ruminating that mkroot is all single processor emulations so testing
this is once again a design issue...

Pretty sure I poked Rich about it at the time, but I just I confirmed that musl
still has the bug in today's git. And the above bionic note is apparently why my
code is looking at sysfs to get the data, and "strace nproc --all" on debian
says that's what they're doing too, and ltrace says it's doing the getconf() so
yes glibc is also doing it.

Musl will apparently allow itself to read data out of /proc, or at least there's
13 hits in the current codebase, but has zero instances of reading out of /sys.

Rob

On 5/2/24 11:20, enh wrote:
> (to be fair, i was shocked the first time i had to deal with an
> Android device where these weren't both the same...)
> 
> On Thu, May 2, 2024 at 9:18 AM enh  wrote:
>>
>> /facepalm
>>
>> maybe move your hand-written version into portability just for musl,
>> and everyone with a working libc just uses sysconf()?
>>
>> On Tue, Apr 30, 2024 at 8:26 PM Rob Landley  wrote:
>> >
>> > On 4/29/24 16:56, enh via Toybox wrote:
>> > > isn't nproc(1) just a call to sysconf(3) with either
>> > > _SC_NPROCESSORS_ONLN for regular behavior, or _SC_NPROCESSORS_CONF for
>> > > --all?
>> >
>> > From musl src/conf/sysconf.c:
>> >
>> > case JT_NPROCESSORS_CONF & 255:
>> > case JT_NPROCESSORS_ONLN & 255: ;
>> > unsigned char set[128] = {1};
>> > int i, cnt;
>> > __syscall(SYS_sched_getaffinity, 0, sizeof set, set);
>> > for (i=cnt=0; i> > for (; set[i]; set[i]&=set[i]-1, cnt++);
>> > return cnt;
>> >
>> > Musl returns the same thing for "conf" and "online".
>> >
>> > Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] stty bug

2024-05-11 Thread Rob Landley
On 5/10/24 06:15, Yi-Yo Chiang via Toybox wrote:
> The _negate & combination_ type of settings are bugged.
> 
> `stty cooked` and `stty raw` works fine, but the negated options:
> 
> $ stty -raw
> stty: unknown option: cooked
> $ stty -cooked
> stty: unknown option: raw

Ack, added to the notes for that command. (It's in pending for a reason...)

Thanks,

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] unshare/nsenter and flags

2024-05-11 Thread Rob Landley
On 5/10/24 18:46, Yifan Hong wrote:
> I am running all commands as a non-root user. Here are the two commands I run:
> 
> strace ./toybox unshare --mount --map-root-user --user /bin/bash -c 'echo' 
> 2>&1
> | tee /tmp/user.txt
> strace ./toybox unshare --mount --map-root-user /bin/bash -c 'echo' 2>&1 | tee
> /tmp/no_user.txt
> strace unshare --mount --map-root-user /bin/bash -c 'echo' 2>&1 | tee
> /tmp/no_user_linux.txt

$ unshare --mount --map-root-user --user /bin/bash -c echo
unshare: unshare failed: Operation not permitted

That's on my host devuan. Let's see about newer...

Ah, booting a daedalus ISO under KVM, the command works. Looks like they added
(enabled?) new kernel plumbing between 3.0 and 5.0.

> Got about half my laptop tabs closed so far! Working towards a reboot...

Ok, time to bite the bullet and finish that, if I need the upgrade to test a 
fix...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] strlower() bug

2024-05-10 Thread Rob Landley
On 5/8/24 16:27, Ray Gardner wrote:
> BTW I was a bit surprised that mentioning my awk for toybox got no reaction.

Oh I'm interested, but somebody (probably you) mentioned they were looking into
it before, and I'll wait to see some code first. :)

(The problem with asking to see code early is pending/git.c isn't useful and
that's as far as the original developer had time/energy for, and I haven't
personally opened that can of worms y et. The problem with waiting until it's
done is pending/bc.c was several times larger than I expected and I'm uncertain
if I even want to personally open that can of worms.)

That said, if you're actively working on it and wanted to do a brief design
infodump here, consider it solicited. :)

Rob

P.S. Also, my old Austin house finally went on the market last weekend and we
got a lowball bid two days later that the realtor was doing the GO GO GO DON'T
STOP TO THINK ACT NOW SUPPLIES RUNNING OUT pressure thing you see in most scams,
because apparently houses and bananas have a similar lifespan and if it's on the
market for longer than it takes anyone outside the realtor's immediate friend
circle to find out about it the world will end. So now there's all the paperwork
in the world. Fade and I had to get a printout notarized on wednesday. They're
asking when we last had all the plumbing and wiring in the walls replaced. (Is
that a thing people regularly do after houses are built?) I had to docusign a
leaded paint affidavit addendum. The old bank that holds the mortage we're
paying off called me to try to upsell me on a NEW mortgage (we're renting for
now), and the person wanting to "transfer me to an agent" wouldn't get off the
phone for 20 minutes. And Wells Fargo got our addresses updated so it says our
checking account type is changing to one that charges us a $10 fee/month for
existing. Anyway, if I seem a bit distracted right now it's because I am.
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] netcat -f bug

2024-05-10 Thread Rob Landley
What's your use case triggering this patch? Because without that, I go off on
various design tangents, as seen below:

On 5/10/24 06:09, Yi-Yo Chiang via Toybox wrote:
> Hi,
> The -f option for netcat doesn't seem to be doing anything right now.

I should have a test for that, but to be honest I came up with netcat -f back in
busybox (commit 1cca9484db69 says 2006) before I knew about bash's <> redirector
to open a file for both reading _and_ writing (or had bash not added it yet?),
meaning the example in that commit probably _should_ have been stty 115200 -F
/dev/ttyS0 && stty raw -echo -ctlecho && cat <>/dev/ttyS0 >&0 2>&0

(I should NOT ask Chet for "{0-2}<>/dev/ttyS0" syntax operating on a filehandle
range. I should not do it. That would be... I dunno, rude? I mean in theory I'd
just want him to fix the existing {1..2} syntax to do one open() and then dup()
redirects instead of opening the device multiple times, which was the initial
problem because reopening the /dev node instead of dup() an existing filehandle
to it either gave -EBUSY or hardware reset the UART depending on the underlying
driver, and the reason chet would give me a LOOK if I asked is {brace,expansion}
is resolved _before_ variable expansion and redirection, so it literally turns
INTO 3 arguments with different numbers and thus three separate open() calls to
the char device, and making it do something else is basically a layering
violation...)

Ahem. Sorry. Tangent.

It's possible netcat -ft makes it still useful, but A) that implies there should
be some sort of tty wrapper in the nice/taskset/time/chroot/nohup mold, B) I
think -t is currently broken because I needed to rewrite it to add nommu support
(decompose forkpty() into the underlying openpty() and login_tty() calls around
the vfork() instead of fork()) and just commented it out and put it on the todo
list...

The original theory was -f should fall through to the "else" case on line 191,
and thus naturally inherit any other applicable options. Which is hard to see in
my current tree because with a bunch of half-finished work in it:

$ git diff toys/*/netcat.c | diffstat
 netcat.c |   62 +-
 1 file changed, 49 insertions(+), 13 deletions(-)

Sorry for falling behind...

> It is
> missing a call to pollinate() after opening the specified device file.
> The patch adds back that line of pollinate().

Which makes it not work with running commands (ala -f should work like -l).

> Also make sure that the timeout handler is not armed for -f mode as -f 
> shouldn't
> timeout. File open() should just succeed or fail immediately.

Why shouldn't -f timeout? Various /dev nodes take a while to open, automount
behind the scenes... Is there a downside to leaving that part as is? (Other than
the new case you added not alarm(0) disarming it?)

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] unshare/nsenter and flags

2024-05-10 Thread Rob Landley
Ok, cycling back to this...

On 5/2/24 21:51, enh wrote:
>> > it seems like -r _doesn't_ actually imply -U in practice (and they
>> > seemed to have strace output to prove it).
>>
>> So... should it?
> 
> i think so? i have no idea about any of this, but
> https://man7.org/linux/man-pages/man1/unshare.1.html says
> 
>-r, --map-root-user
>Run the program only after the current effective user and
>group IDs have been mapped to the superuser UID and GID in
>the newly created user namespace. This makes it possible to
>conveniently gain capabilities needed to manage various
>aspects of the newly created namespaces (such as configuring
>interfaces in the network namespace or mounting filesystems
>in the mount namespace) even when run unprivileged. As a mere
>convenience feature, it does not support more sophisticated
>use cases, such as mapping multiple ranges of UIDs and GIDs.
>This option implies --setgroups=deny and --user. This option
>is equivalent to --map-user=0 --map-group=0.
> 
> which sounds like it supports the toybox documentation rather than the
> toybox source?
> 
>> What did they try to do, and what did they _want_ to happen?
> 
> unshare --mount --map-root-user /bin/sh -c "mount --bind $A $B"

Running that as my normal user gave EPERM on the unshare(CLONE_NEWNS) which is
the reason I haven't poked at this more. (To be useful, it seems like it
probably needs to be setuid and then drop permissions after unsharing stuff, and
I need to come up to speed on the security implications of that and possibly
write a "contain" command with as little novelty as possible. Which is not a can
of worms I want to open without a clear desk...)

Running it under sudo I got:

openat(AT_FDCWD, "/proc/self/setgroups", O_WRONLY) = 3
write(3, "deny", 4) = -1 EPERM (Operation not permitted)

> they looked at strace for toybox and saw
> 
> unshare(CLONE_NEWNS)= -1 EPERM (Operation not permitted)
> 
> but for the util-linux one they saw
> 
> unshare(CLONE_NEWNS|CLONE_NEWUSER)  = 0

Are they root or a normal user? Because adding -U to the above command line I 
got:

geteuid()   = 1000
getegid()   = 1000
unshare(CLONE_NEWNS|CLONE_NEWUSER)  = -1 EPERM (Operation not permitted)

But with sudo, that succeeded and adding an ls -l to the bash command yes it did
the bind mount, which is gone again when it exits.

>> The "22.04" means it came out two years and one month ago, and that's what
>> they're migrating me TO. So, you know, I can presumably feel less bad about 
>> my
>> laptop...
> 
> (to be fair, until _last week_ that was the current LTS release :-)
> but, yeah, odd timing unless they deliberately like to be on the
> previous LTS release! i'll throw no stones as long as i'm living so
> close to the Android build server glass house though...)

Got about half my laptop tabs closed so far! Working towards a reboot...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] strlower() bug

2024-05-08 Thread Rob Landley
On 5/6/24 17:12, Ray Gardner wrote:
> While working on an awk implementation for toybox, I found a bug in
> strlower(), which is used only in find.c. I've attached some tests to
> put in find.test to reveal it. I can't put them here directly because
> I don't think the UTF-8 names will come through. (I modelled my awk
> tolower()/toupper() code on your strlower().)

Your test doesn't create the files you're finding, so find is supposed to fail?
Your first test doesn't barf under ASAN, and then the second one's going to fail
because echo -n | wc says it's 258 bytes and the VFS file length limit is 255
bytes, so there CAN'T be a file named that on Linux. (Path length != path
component length, there's no slashes in there.)

> The problem is in the test if the output string needs to be enlarged
> to take an expanded lowercase:
> // Case conversion can expand utf8 representation, but with extra mlen
> // space above we should basically never need to realloc
> if (mlen+4 > (len = new-try)) continue;
> 
> The mlen+4 needs to be mlen-4 to leave at least 4 bytes for the next 
> character.

Hmmm, possibly. I still don't understand what your test case is testing. (Just
trying to trigger an ASAN violation with an otherwise nonsense test?)

> As the comment indicates, it should "never" need to realloc;

No, the first comment is "never" because triggering probably indicates a libc
bug (we converted it from valid utf8 to a unicode code point, ran it through
libc's towlower(), and are now trying to convert the result _back_ to utf8, an
encoding hiccup at this point seems unlikely? But I don't trust locale plumbing
ever, so...)

The second is "basically never" because it requires an insane input string, but
that's user controlled and users do crazy things, sometimes even intentionally.

> it takes
> a very long name of uppercase characters that do expand when made
> lowercase. But the code is there to handle that very case.

The first malloc rounds the allocation up to next 8 byte boundary _after_ what
it's actually using, so 9-16 bytes of zeroes at the end, and assuming the
conversion only ever grows 1 byte (I don't remember the pathological expansion
case, it's in my blog somewhere, but your test is turning c8 ba into e2 b1 a5
which is 1 byte of expansion) then you need at least 8 expanding unicode code
points to burn through the padding, so your first test string is too short to
trigger a problem. And your second is too long to produce a valid filename, so
the test can't _succeed_...

Sigh, lemme come up with a test that demonstrates the fix working... the minimal
one seems to be ./find . -iname aȺ

And then, of course, TEST_HOST fails because I need to enable a utf8 locale, but
I made plumbing for that recent-ish-ly...

commit 6800a95ef328

> BTW, when I run those tests, they "PASS", but show as aborted:
> corrupted size vs. prev_size
> scripts/runtest.sh: line 137: 265983 Aborted find .
> -iname 
> AC
> PASS: find utf8 uppercase long name

Odd.

> The test echos and checks the $? return code and the abort apparently
> leaves that as 0.

That could be anything from a bash issue to your distro's libc. The only trap in
tests.sh is for SIGINT, and that handler isn't inherited by child processes. The
return code of a process killed by a signal should be 128+signum, which the test
plumbing would notice if it was the actual exit code of your shell snippet.

I checked in a test that should actually succeed, but would fail with ASAN
enabled before the bug was fixed.

> Is there a way to fix the test system so it can
> force the exit code to be something else?

Not if the signal/exit isn't allowed to propagate back to it by the test. You
ran a child process and then unconditionally did an ;echo $? meaning test.sh
doesn't get notified of the child process getting killed by a signal, it
unconditionally (because ;) went on to run a second command, "echo" which is
returning whatever your bash recorded.

Some distros have horrible fault interceptors that log crap into syslog or dmesg
or some such, AND THEN RETURN SUCCESS. (Which is doubly insane: A) a program
faulting does not need to be globally logged on a development system, B)
returning success when that happens is very sad, but their "logic" was that some
scripts would otherwise misbehave.)

> When I run the test from a
> command line directly in bash, it gets a code of 134 (SIGABRT).

Without ASAN I'm getting 139 (128+11 = SIGSEGV). There would appear to be a
difference in our environments.

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [mkroot] Cannot Overwrite non-directory "$ROOT/bin/" with directory "[Path to overlay]"

2024-05-07 Thread Rob Landley
On 5/7/24 15:50, Rob Landley wrote:
> And THAT was based on the old environment setup I used to do in Firmware Linux
> to give User Mode Linux a mostly writeable chroot despite starting with
> https://user-mode-linux.sourceforge.net/hostfs.html but that was back before 
> git
> was invented so I just have a bunch of tarball snapshots over the years (at
> https://landley.net/aboriginal/downloads/old/) rather than

A) Sorry, forgot to explain,

B) That's not even the old one I'm talking about,
https://landley.net/aboriginal/old/download/snapshots probably is.

User Mode Linux is a port of Linux to userspace, I.E. making the "vmlinux" ELF
file built at the top of the tree an actual runnable Linux program, which boots
its own little VM and runs processes inside it. This predated QEMU or KVM by a
decade, and was one of the first ways to run a virtual Linux system without
requiring root access on the host. Firmware Linux was built around it,
Aboriginal Linux was the relaunch targeting QEMU instead (and doing cross
compiling, because UML only ever properly supported x86 for some reason).

UML had the "hostfs" filesystem, which acted like a network filesystem making a
directory from the host appear in a directory of the virtual system. (Again,
decades before virtfs and friends, although NFS and Samba were around.)

The problem was, a hostfs file belonging to root (UID 0) wasn't writeable to
root within the VM. The mount point was SORT of writeable, but it was getting
translated on the host to reads/writes/renames/deletes as the host user running
UML, and then the translated syscall would fail and failures that shouldn't
happen were getting returned on the client. And this included fixups you needed
to do like replacing /etc/mtab with a symlink to /proc/mounts (because mount
points became a per-process attribute in Linux 2.5, so a single global mount
table as a filesystem maintained by the userspace mount tool didn't cut it 
anymore).

So I made a script that created a new directory in the host user's fully
writeable space and populated it with symlinks to host resources before
chrooting into it (all within UML), so I had access to the host stuff I needed
but could also replace it all as needed. And that's what I did my emulated Linux
>From Scratch build in, back around 2004.

Anyway, "here's a thing that needs to be spliced into the $PATH, you may want to
use symlinks" sometimes goes "whoosh" over my head as "hard for people who
haven't done it before" because to ME it's a 20 year old trick. Sorry 'bout 
that...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [mkroot] Cannot Overwrite non-directory "$ROOT/bin/" with directory "[Path to overlay]"

2024-05-07 Thread Rob Landley
On 5/6/24 23:33, Oliver Webb wrote:
> On Sunday, May 5th, 2024 at 21:21, Rob Landley  wrote:
> 
>> Oh, the other todo item here is "multiple overlays". The current overlay 
>> package
>> was a quick hack, never did the design work to figure out what what more
>> complication should look like. Partly waiting for people to complain to me 
>> that
>> they need more than it does...
> 
> Maybe making the OVERLAY variable a delimiter separated list, looping over
> it each time the overlay package is specified. Then indexing the OVERLAY 
> variable like a
> array with that counter (I don't really know how bash arrays work, I think 
> this is easy
> with them though from my vague knowladge, although I don't know)?

Various things are easy to implement, the question is what user interface works
best. The rest of mkroot uses CSV internally so having CSV in an option isn't
that heavy a lift. (Although it hasn't been presented as external UI before, and
relative vs absolute paths in a comma separated list is a bit tricksy, and we
never DID address "what happens if you define the same variable twice on the
command line? Right now it overwrites...)

The other sharp edge is "when files conflict between overlays, do you overwrite
or leave the old one or what".

And of course the "following symlinks out of tree" problem, which I added a tar
option to address and what I've vaguely thought of doing here was having toybox
tar handle it doing the tar c | tar xv trick with --restrict to just leverage
the existing stuff.

And I have a note about sparse file handling, which is at least 3 todo items
combined into one note:

1) cpio sparse handling is part of the periodic https://lwn.net/Articles/789228/
threads that never resolved last I checked, the last attempt wound up diverging
into https://lkml.iu.edu/hypermail/linux/kernel/2207.3/06939.html which
eventually went upstream (after it got completely rewritten to not smell like me
so Greg KH would tolerate it) but the cpio extension part didn't get brought
back up that I've been cc'd on...

2) tar sparse handling should have both modes (SEEK_HOLE and detect runs of
zeroes), and then the tar.test stuff updated to mostly use the runs of zeroes
because there are some TERRIBLE FILESYSTEM implementations out there and none of
them seem to agree on span granularity. (How big IS the run of zeroes? Where are
the edges? Just seek past 4k aligned blocks isn't good enough, and it doesn't
look like 64k is either. Don't get me started on "ecryptfs"...

3) add sparse support to cp.c. (Grumble grumble --sparse longopt without short
opt, and should --sparse=auto be the default behavior? If the filesystem doesn't
support sparse files then presumably seek-and-write will zero fill anyway and we
don't have to do anything. Or seek would fail, which I guess we should
gracefully handle but sendfile_pad() already has plumbing for that?)

>> It hasn't got "make". Kind of limiting factor not to have a make command on 
>> the
>> target.
> 
> gmake has a "./build.sh" that you can use to bootstrap it up on a system
> without make. My first step in this after I hacked together a overlay was
> to get a gmake tarball and try to build it with "./configure && ./build.sh && 
> ./make install",
> which configure (on host bash, not toysh) goes into a infinite loop without
> expr, and putting that in will fail because "host compiler does not produce
> run-able executable" (Which might be true because I have to manually hack 
> together
> a overlay each time and I throw out quick "hello world" tests mostly).

Good to know.

Way back when, I had a script that would splice the toolchain.sqf into the host
filesystem with a bunch of symlinks, ala
https://github.com/landley/aboriginal/blob/master/system-image.sh#L65 splicing
together
https://github.com/landley/aboriginal/blob/master/sources/toys/dev-environment.sh
and https://github.com/landley/aboriginal/blob/master/sources/toys/make-hdb.sh
although the interesting part was probably
https://github.com/landley/aboriginal/blob/master/sources/toys/dev-environment.sh#L72

And THAT was based on the old environment setup I used to do in Firmware Linux
to give User Mode Linux a mostly writeable chroot despite starting with
https://user-mode-linux.sourceforge.net/hostfs.html but that was back before git
was invented so I just have a bunch of tarball snapshots over the years (at
https://landley.net/aboriginal/downloads/old/) rather than

Pretty sure I have old blog entries at https://landley.livejournal.com
explaining what I was doing and why, but ever since the servers moved around I
haven't wanted to fish in them, archive.org is slow and has terrible UI, and my
backup disks from that period are... somewhere. Everything's still packed from
the move, I could 

Re: [Toybox] [mkroot] Cannot Overwrite non-directory "$ROOT/bin/" with directory "[Path to overlay]"

2024-05-05 Thread Rob Landley
On 4/27/24 20:44, Oliver Webb via Toybox wrote:
> Doing minimal linux system setup with mkroot and trying to create a minimal 
> environment
> with a native toolchain to run autoconf in. This would mean getting the 
> native static
> toolchain for my architecture from 
> https://landley.net/toybox/downloads/binaries/toolchains/latest/.
> Mounting the image (Why are cross compilers tarballs while native compilers 
> are fs images?

Copying the native compiler into the initramfs takes more space than initramfs
can comfortably hold. The run-qemu.sh in mkroot defaults to -m 256 (I.E. 256
megabytes system memory), and some board emulations (like mips) _can't_ map more
than that. (Making the boards consistent is good, it's enough to run a single
threaded compile, and it's nice for running lots of instances in parallel on the
host ala mkroot/testroot.sh.

Even ignoring that, the kernel's cpio extractor generally has its own size
limits. The initial physical memory layout only leaves so large a gap between
"where we loaded the cpio.gz" and "where we extract it to", and when you fill up
that gap at a certain point the extract overwrites the data it's reading,
because initramfs isn't _expected_ to be multiple gigabytes in size. Again, how
much you've got varies by target but adding a quarter gigabyte of toolchain
didn't work on multiple boards when I tried it.

Shrinking the toolchain down has some hard limits: even way back in the
aboriginal linux days when I was trying to set up a tinycc compiler on target,
just the extracted /usr/include headers took up quite a bit of space:

$ cd ccc
$ du -s i686-*cross/*/include
23148   i686-linux-musl-cross/i686-linux-musl/include

Currently 23 megabytes (and another couple megabytes for the compiler includes).
Keeping them in a squashfs was more memory efficient.

> Wouldn't making them tarballs mean that you could extract their contents 
> without running
> losetup and dealing with mounting devices and needing root permissions ?

Squashfs is an archive format, there's an unsquashfs command to extract it if
you want to fiddle with it on the host, although mount-and-copy in mkroot works 
too.

The problem (read-only) mounting a compressed archive is seekability: on normal
block devices the kernel can jump around and grab chunks of directory
information and file contents into dcache and page cache, and be free to discard
them again under memory pressure so they should be cheap to get back. That's the
design expectation for filesystems.

The problem with a tarball is you need to extract the whole thing starting at
the beginning to find where anything _is_. You can fix that by building an index
at mount time (extract the whole thing, examine the contents, and make notes)
but that makes mount really slow and also means you have a data tree you can't
discard so you've more or less pinned your directory cache if you want to know
where all the files start.

Zip file format addresses the dentry part because it was designed to let you
extract individual files, but it doesn't address seekability _within_ a file. If
you try seek 10 megs into a file (or mmap from that point) it has to extract and
discard 10 megs of data. (The main downside of zip files A) individually
compressing each file is less efficient than compressing the whole archive, so
they tend to be larger, B) zip puts all its metadata at the _end_ of the file,
so if the file is truncated at all you've lost ALL the contents because it
doesn't know what any of the rest means anymore. Incomplete zip file transfers
were worthless because it has to start reading at the end to find anything. The
reason it did that was so amending existing zip files in place was quick,
because it can remove and rewrite the metadata easily. If the metadata wasn't at
the end and needed to be expanded, it would either need to move all the file
contents to make room, or break the metadata into chunks and parse together
scattered overlays. Of course replacing a file in the archive wasted space
because unless the old file had coincidentally been at the very end of the
archive, it left the old one in there and just added the new copy and updated
the index to point at it.)

Most compression formats handle files in chunks: bzip2 does 900k blocks, gzip
does periodic dictionary resets, etc. Using a compression format with a
reasonable chunk size and tracking where each chunk starts lets you handle seeks
reasonably well, and that's what squashfs does. I haven't looked up the actual
file format, but conceptually it's a zip file plus chunk indexes within files.

> I trust they were
> made fs images for a good reason, but... _why_).

Within mkroot, squashfs is easier to deal with because I don't need to reserve
destination space to extract everything into to poke at the contents. Outside of
mkroot, squashfs isn't that much harder to play with, mostly just less familiar.

> And ideally running a mkroot overlay on
> it because that's what the overlays seem to be 

Re: [Toybox] Fw: Re: Dude.

2024-05-05 Thread Rob Landley
On 5/4/24 11:34, Oliver Webb via Toybox wrote:
> (Rob wants this on the list anyways, and he hasn't CC:-ed it.

If I want to send a message to the list, I'm capable of doing so.

As I said in the postscript I don't _object_ to it being on the list (in an "I
say the same things in public as in private" way), and I did lament that I'd
spent half a work day composing several thousand words to just one person rather
than as a general resource I could refer back to in future or maybe get a FAQ
entry out of.

But thinking about it after the fact (when I got your reply), I honestly didn't
expect other people on the list to be interested beyond maybe closure. It's
potentially useful to know that the guy who wrote about half of all messages to
the list last month (35/76 in the web archive) might stop.

>  I want it on the list for multiple reasons.

Or might not.

Reading lots of text is _work_. I reference "pascal's apology" a lot (him being
sorry for writing a long letter because he didn't have time to write a short
one), because people try to _read_ this stuff. (Or worse, they stop trying.) I
try to keep the signal to noise ratio up, and that means editing it DOWN. Which
takes time and energy. (This reply is uncomfortably rambly, but I've already
spent a day away from the keyboard going "I have to coherently reply to this"
and not wanting to.)

> (I gave him permission
> to cc it in a reply email I intend to forward to the list))

And I didn't, so you put words in my mouth again about what I "want".

This thread doesn't advance the project, and I doubt this exchange offers much
insight into _my_ behavior. I've been posting publicly for a quarter century on
linux-kernel and busybox and uclibc and toybox and j-core and elsewhere. I've
maintained _this_ project for 17 years. I've made policy statements about it in
design.html and the faq and on the list and in my blog (and twitter and mastodon
and livejournal and talks on youtube and mp3 recorded panels from penguicon and
linucon and heck, you can pull my old comments out of slashdot and lwn.net if
you try). There's even a code of conduct which HIGHLY IRONICALLY was originally
copied from twitter's. (No really:
https://github.com/landley/toybox/commit/bc308973ffb6) People already have
_plenty_ of rope to hang me with if they decide they need a reason.

I care about the _code_. what's best for the _project_. I also care about
documentation, but the problem is usually "too much" and needing to boil it down
and put it somewhere obvious where it's indexed and people can easily find it.

I'm trying NOT to make it about me. I'm very fiddly about the work, sometimes
trying (and failing) to do the programming equivalent of Faberge Eggs, and the
perfect can be the enemy of the good. But that's what distinguishes this project
from the half-dozen other implementations of the same stuff already out there.

That said, you pointed me at a message where you'd asked an actual question:

> > > because I've been trying to run gcc under mkroot and a response to
> > > http://lists.landley.net/pipermail/toybox-landley.net/2024-April/030334.html
> > > would've been helpful.
>
> Hadn't seen it. It got, quite literally, lost in the noise.

And I'd started replying to that (before you sent this to the list), and stopped
because it was too long and I needed to edit it down. I should just press "send"
on the ramble and move on to the next todo item. (Top of stack is fixing unshare
I think...)

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] unshare/nsenter and flags

2024-05-02 Thread Rob Landley
On 5/2/24 13:14, enh via Toybox wrote:
> another googler wanted a host unshare(1) for some testing... i added
> that, and they complained that although the docs say
> 
> -r Become root (map current euid/egid to 0/0, implies -U) 
> (--map-root-user)
> 
> it seems like -r _doesn't_ actually imply -U in practice (and they
> seemed to have strace output to prove it).

So... should it?

What did they try to do, and what did they _want_ to happen?

I'd compare with my debian unshare command but my install is a bit out of date.
(According to https://endoflife.date/devuan I've still got 4 weeks of support.)

Coincidentally, I just got an email yesterday morning from "The Happy Dreamhost
Upgrade Robot" (yes really) that they're updating landley.net's web container:

> We have great news! As part of our mission to support you with your digital
> presence, we are always looking to improve your products and provide you with
> the most advanced and powerful hardware.
> 
> On Wednesday, May 8th we will be migrating you to a newer shared server. As
> part of this maintenance, the operating system will be upgraded from Ubuntu
> Bionic to Ubuntu Jammy Jellyfish 22.04.2.
> 
> In most cases, no action is required on your part, but we've prepared some
> documentation that will help you prepare for the upgrade to Ubuntu Jammy:
> https://help.dreamhost.com/hc/en-us/articles/15506945971220

The "22.04" means it came out two years and one month ago, and that's what
they're migrating me TO. So, you know, I can presumably feel less bad about my
laptop...

> i was assuming the code was just missing, but when i looked, i found:
> 
> // unshare -U does not imply -r, so we cannot use [+rU]
> if (test_r()) toys.optflags |= FLAG_U;

Let's see, git annotate says that comment comes from commit 3c0be8a473c0:

Author: Samuel Holland 
Date:   Sun Apr 12 16:00:16 2015 -0500

unshare: fix -r

Calling unshare(2) immediately puts us in the new namespace
with the "overflow" user and group ID. By calling geteuid()
and getegid() in handle_r() after calling unshare(), we try
to map that to root, which Linux refuses to let us do.

What we really want to map to root is the caller's uid/gid
in the original namespace. So we have to save them before
calling unshare().

Meanwhile the "implies" in the help text comes from commit fb4a241f35cf two
months earlier:

Author: Rob Landley 
Date:   Wed Feb 18 15:19:15 2015 -0600

Patch from Isaac Dunham to add -r, fixed up so it doesn't
try to include two flag contexts simultaneously.

So it looks like Isaac made -r imply -U and Samuel made it _not_ do so, without
changing the help text, and I didn't notice because I'd really like to build
domain expertise here but haven't got it. (Largely because doing container stuff
tends to require root access, and if I'm requiring root access anyway I tend to
just chroot, or launch a qemu instance that does NOT require root access on the
host. It's on the todo list...)

I've used toybox's unshare command a bunch of times, but not the UID remapping
parts...

> but note the unshare/nsenter sharing there --- is it a problem that i
> have unshare enabled but not nsenter? is that expected to work?

I'm happy to implement proper semantics here if I know what they _are_. What
_should_ it do?

I recently blogged (https://landley.net/notes.html#13-04-2024) about attending
yet another container talk at txlf, but if I really want a "contain" command
what I should probably do is dig through:

  https://github.com/p8952/bocker
  https://github.com/Fewbytes/rubber-docker
  https://blog.lizzie.io/linux-containers-in-500-loc.html

And "come up with something". It would be really nice if there was a simple
existing syntax I could be compatible with, which is why I was vaguely looking
at what minijail does, and https://github.com/rkt/rkt and
https://github.com/opencontainers/runc and https://github.com/containers/crun
and https://github.com/containerd/containerd and so on.

But that's a fresh can of worms to open after I close a couple of existing ones,
and to get to 1.0 the LFS build needs "awk" more than container support...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] nproc(1)

2024-04-30 Thread Rob Landley
On 4/29/24 16:56, enh via Toybox wrote:
> isn't nproc(1) just a call to sysconf(3) with either
> _SC_NPROCESSORS_ONLN for regular behavior, or _SC_NPROCESSORS_CONF for
> --all?

>From musl src/conf/sysconf.c:

case JT_NPROCESSORS_CONF & 255:
case JT_NPROCESSORS_ONLN & 255: ;
unsigned char set[128] = {1};
int i, cnt;
__syscall(SYS_sched_getaffinity, 0, sizeof set, set);
for (i=cnt=0; ihttp://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] DreamHost Security Alert

2024-04-25 Thread Rob Landley
On 4/24/24 13:10, Rob Landley wrote:
> Alas, my website's likely to be down for a bit while I explain to them that 
> "the
> compiler that got used to build an exploit" and "the exploit" can share 
> strings
> because gnu is incompetent and leaks the path where things got built into the
> resulting binaries, but that does not mean that the compiler the strings came
> from in the first place is actually infected.

And it's back. Human saw the email thread at 9am and took reasonable action.

I was a little annoyed it was down all day, but eh: nine fives. Close enough.
They're cheap and I don't have to do it.

Rob

(Before them I had a server with a static IP where I ran all my own servers,
which meant I had one DNS server pointing to all the other services, and a
number of sites went "but DNS says you need TWO authoritative servers" and I
went "I'm not paying for a second static IP and all the records would point to
the first static IP so if it goes down what does being able to look up the name
of the services that aren't currently THERE accomplish? And that's before DNS
required cryptographic signatures, and then "sender permitted from" showed up in
email around then and NONE of those checkers would work without 2 DNS servers so
I _couldn't_ set it up... So yes I _could_ get one of my orange pi boards sent
to one of the raspberry pi hosting sites that give a static ipv4 as part of the
hosting package, but... I really don't want to?)
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] DreamHost Security Alert

2024-04-24 Thread Rob Landley
Alas, my website's likely to be down for a bit while I explain to them that "the
compiler that got used to build an exploit" and "the exploit" can share strings
because gnu is incompetent and leaks the path where things got built into the
resulting binaries, but that does not mean that the compiler the strings came
from in the first place is actually infected.

I mean, here's an article from 2018:

https://www.bleepingcomputer.com/news/security/mirai-iot-malware-uses-aboriginal-linux-to-target-multiple-platforms/

Rob

(I'd point to old blog entries where I went "huh, my compilers got used to build
random russian malware" ten years ago, but my blog was on my site so you
wouldn't see it unless I fish it out of archive.org...)

 Forwarded Message 
Subject: DreamHost Security Alert - Malware on landley.net
Date: Wed, 24 Apr 2024 09:53:09 -0700 (PDT)
From: DreamHost Abuse Team 
To: r...@landley.net

Hello Rob Landley,

We have received a report of malware at the following location:

hXXps://landley.net/aboriginal/downloads/old/binaries/1.2.6/cross-compiler-armv7l.tar.bz2

This means that your site has likely been compromised. We have taken the site
offline by renaming its directory (appended _DISABLED_BY_DREAMHOST). Please do
not re-enable it until you can address the problem.

In general, the three most common entry points for a compromised website are:

1. Vulnerable, typically out-of-date software (such as blogs, forums, CMS,
associated themes and plugins, etc.)
2. A cracked/brute-forced admin login for a web application like WordPress,
Joomla, Drupal etc.
3. A compromised FTP/SFTP/SSH user password.

1. All software you have installed under your domain should always be kept
up-to-date with the most recent version available from the vendors' website, as
these often contain security patches for known issues. Older versions of
well-known and popular web software (including Wordpress, Drupal, Joomla, etc.)
are known to have vulnerabilities that can allow injection and execution of
arbitrary code.

2. If you utilize a web application with a script-based administrative backend
(like WordPress, Joomla, or Drupal), make sure that you're not using a generic
username like "admin" or "webmaster" for the user with administrative
privileges. Hackers will slowly brute-force common usernames in order to get
access to a script's backend and whatever tools exist there that allow file
uploads, alterations, or execution of code.
3. FTP/SFTP/SSH passwords can be compromised and used to modify files. The most
important part of securing your account in this case is to change your FTP
user's password via the (USERS > MANAGE USERS) -> "Edit" area of the control
panel. Passwords should not contain dictionary words and should be a string of
at least 8 mixed-case alpha characters, numbers, and symbols. It is also
recommended to always use Secure FTP (SFTP) or SSH rather than regular FTP,
which sends passwords over the internet in plaintext. You can disable FTP for
your user(s) within the DreamHost panel (USERS > MANAGE USERS) section.

At this point, we recommend logging into your DreamHost server and removing the
content we listed. (Note: You may first need to reset the permissions). You
should also look for any other files/directories you did not upload yourself and
update all your website components where applicable. As for determining which
entry point is the cause of this incident, for 1 and 2, you can review the
Apache logs for suspicious activity and requests to suspicious files. Keep in
mind that we typically only keep around 5 days worth of Apache logs. For 3, you
can refer to this article to find recent logins to your user:
https://help.dreamhost.com/hc/en-us/articles/214915728-Determining-how-your-site-was-hacked

For further help on this topic, you can refer to our Knowledge Base:

https://help.dreamhost.com/hc/en-us/articles/215604737-Hacked-sites-overview
https://help.dreamhost.com/hc/en-us/sections/203242117-Logs

Lastly, we have scheduled an automated malware scan and if anything is found, we
will send you a separate email with those results.

If you need further assistance, please respond directly to this email.

Thank you for your cooperation!
-DreamHost Abuse Team
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] xxd: -d Decimal Lables flag, Don't cap at one file

2024-04-22 Thread Rob Landley
On 4/22/24 17:17, enh via Toybox wrote:
> ah, yeah, the _include_ path uses the full buffer and -r uses stdio
> buffering, but "regular" xxd was doing neither. i've sent out the
> trivial patch to switch to stdio.

Ah, performance tweak.

*shrug* Applied...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] xxd: buffer input via stdio.

2024-04-22 Thread Rob Landley
On 4/22/24 17:17, enh via Toybox wrote:
> ---
>  toys/other/xxd.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)

What's the issue this fixes? It's not:

  for i in $(seq 1 100); do echo $i; sleep 1; done | ./xxd

Because that won't produce output for a couple minutes...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] shuf works on FreeBSD

2024-04-20 Thread Rob Landley
On 4/20/24 03:42, Vidar Karlsen via Toybox wrote:
> toys/other/shuf.c builds and runs on FreeBSD and can be enabled in
> freebsd_miniconfig with CONFIG_SHUF=y.
> 
> I can't think of a use case for it, but I'm sure there are some.

I thought it was enabled in commit 93c8ea40a back in November?

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] Microsoft github took down the xz repo.

2024-04-16 Thread Rob Landley
On 4/15/24 03:53, Jarno Mäkipää wrote:
> On Sun, Apr 14, 2024 at 9:14 AM Oliver Webb via Toybox
>  wrote:
>>
>> To revive a old thread with new technical info I stumbled upon:
>>
>> On Saturday, March 30th, 2024 at 15:58, Rob Landley  wrote:
>>
>> > I set up gitea for Jeff on a j-core internal server, and it was fine 
>> > except it
>> > used a BUNCH of memory and cpu for very vew users. Running cgi on 
>> > dreamhost's
>> > servers is a bother at the best of times (I don't want to worry about 
>> > exploits),
>> > and the available memory/CPU there is wind-up toy levels.
>> >
>> > My website is a bunch of static pages rsynced into place, some of which use
>> > xbithack to enable a crude #include syntax, and that's about what the 
>> > server can
>> > handle.
>>
>> Going through the list of "minimal tools" on https://suckless.org/rocks/,

Not really a fan of that site. I did a roadmap section on them long ago
(https://landley.net/toybox/roadmap.html#sbase), but I'm trying to implement
mostly compatible versions of things that already exist, and they're trying to
invent new things that didn't previously exist because https://xkcd.com/927/
which I mostly consider fragmentation rather than helping, and I try not to
encourage them.

>> I stumbled
>> upon a git frontend called stagit 
>> (https://git.codemadness.org/stagit/file/README.html)
>> which the suckless project uses as it's git frontend.

When microsoft bought github I mirrored my repo on my website so you could pull
it from there, but doing that doesn't have any web interface so I did a quick
and dirty bash script to upload the "git format-patch" of each commit, with
symlinks from the 12 character hash to the full hash (because doing _each_ one
was an insanely slow exercise in inode exhaustion).

You're once again telling me what I did was not good enough for you, and that I
am wrong, and must change to suit you.

>> But to have a solution, you must have a problem. The 2 main issues I have 
>> with the current git management
>> are the fact

I'm very tired.

>> there doesn't seem to be a way to clone the current repo directly from 
>> landley.net (Making Microsoft
>> GitHub the middleman).


$ git annotate www/header.html | grep -w git
fb47b0120   (Rob Landley2021-09-12 14:33:36 -0500   30)  
https://landley.net/toybox/git>local
$ git show fb47b0120
commit fb47b0120f7aa73c0821a8c55e15540d83baed01
Author: Rob Landley 
Date:   Sun Sep 12 14:33:36 2021 -0500

Add a local git mirror (todo item since github was acquired)...

diff --git a/www/git/index.html b/www/git/index.html
new file mode 100644
index ..bade8d1b
--- /dev/null
+++ b/www/git/index.html
@@ -0,0 +1 @@
+Not browseable: git clone https://landley.net/toybox/git

$ git log scripts/git-static-index.sh
commit 990e0e7a40e4509c7987a190febe5d867f412af6
Author: Rob Landley 
Date:   Sat Dec 24 06:34:11 2022 -0600

Script to put something browseable in https://landley.net/toybox/git

https://landley.net/notes-2022.html#22-12-2022

>> And the fact I can't browse the source code without github or android code 
>> search acting as
>> the middleman

I do not have source tree snapshots up. Kinda hard to do in a static manner
without uploading rather a LOT of files (and even if you upload each version of
"git log" for each file and create an index file for each commit with the ls -lR
of the whole tree linking to the relevant version, the URLs to the files are
ugly. I can do it, but don't really want to? Linking to individual lines of the
file while also having the raw text kinda implies uploading two versions and I
just dowanna. Oh, and dreamhost's server config doesn't have sane file
associations for all the types so if I put up a .c file it wants to DOWNLOAD it
instead of displaying it as text and trying to .htaccess that more of a pain
than I'm up for, so I would wind up having blah.c.txt and blah.c.html files and
that's just ugly...)

Plus, syntax highlighting: you'd THINK there would be some nice linux syntax
highlighting packages out there but not counting "use vi" (which doesn't work
for me anyway, :syntax = "E319: Sorry, the command is not available in this
version")...

Searching around I found https://github.com/alecthomas/chroma which is very
proud that it's written in "pure go"... except it's a wrapper for a python
library, and python's runtime is written in C, so DEFINE PURE...

Digging into the aforementioned python (don't get me started) library, the
"python-pigmentize" package installs the man page for a command "pygmentize",
and the bash completion for the command pygmentize, but does not install the
actual command in the $PATH (or anywhere

Re: [Toybox] df not working on FreeBSD

2024-04-16 Thread Rob Landley
On 4/15/24 04:37, Vidar Karlsen via Toybox wrote:
> Hello,
> 
> df throws the following error on FreeBSD:
> 
> root@140amd64_noopts-usrports:/usr/local/toybox/bin # ./df /
> df: getmntinfo: Invalid argument
> 
> A little bit of poking around shows that getmntinfo expects the second
> argument (the mode) to be one of these, and not 0:

Presumably it worked at one point, but I didn't write that bit...

> sys/sys/mount.h:
> #define MNT_WAIT1   /* synchronously wait for I/O to complete */
> #define MNT_NOWAIT  2   /* start all I/O, but do not wait for it */
> #define MNT_LAZY3   /* push data not written by filesystem syncer */
> #define MNT_SUSPEND 4   /* Suspend file system after sync */
> 
> Changing 0 to MNT_NOWAIT in portability.c makes df happy again.

And doesn't break macos, so I'm not adding the #ifdef in your patch. (I don't
have an openbsd test environment lying around, but
https://man.openbsd.org/getmntinfo.3 links to
https://man.openbsd.org/getfsstat.2 which says the options are MNT_WAIT and
MNT_NOWAIT so presumably they're happy too.

Commit 7d9ee89d3cf8.

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] httpd: How is this supposed to be _used_?

2024-04-14 Thread Rob Landley
On 4/13/24 14:09, Oliver Webb via Toybox wrote:
> The first thing I ran into is that httpd doesn't do that by default,
> running "toybox httpd dist/" won't actually host those pages
> on localhost.

It's an inetd client:

  https://en.wikipedia.org/wiki/Inetd

  toybox netcat -s 127.0.0.1 -p 8 -L httpd .

I've been meaning to come up with an actual inetd, and possibly lib/*.c plumbing
to do standalone servers, but nommu support and rate limiting incoming
connections and so on all go in a layer I haven't implemented yet and am not
interested in reproducing in multiple commands.

Genericizing the plumbing I've already got in netcat, but making it available
from individual commands, implies having a standard set of command line
utilities that get exposed in commands to specify address to bind to and port to
listen on and max simultaneous connections (including max per source IP) and
output inactivity timeout Possibly some sort of IPSERVER macro flung into
the option string, with a corresponding structure in TT and then a function I
call? Or maybe just stick with inetd so it's somebody else's problem...

I explain this here periodically, by the way:
http://lists.landley.net/pipermail/toybox-landley.net/2024-January/03.html

In theory a tcpsvd was contributed to pending long ago, which kind of has the
od/hexdump/xxd problem of multiple implementations not sharing code (as I
periodically mention here, ala
http://lists.landley.net/pipermail/toybox-landley.net/2023-January/029410.html).
It's in the todo heap...

> "Why?": Looking at the source code and typing
> input into httpd, it wants input from stdin and seemingly outputs to
> stdout like a normal unix tool (which httpd is usually not).

As all inetd clients do, yes: nbd_server.c is another one. Lots of other things
(like the tftpd in pending, or dropbear) can work in inetd mode.

> Forgive me, but I'm going to compare this to busybox httpd.

You do you.

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] today in "shut up, gnu!"

2024-04-13 Thread Rob Landley
On 4/12/24 13:24, enh via Toybox wrote:
> ~/aosp-main-with-phones$ find external/ -name NOTICE -type l -maxdepth 2
> find: warning: you have specified the global option -maxdepth after
> the argument -name, but global options are not positional, i.e.,
> -maxdepth affects tests specified before it as well as those specified
> after it.  Please specify global options before other arguments.
> 
> (it does do the right thing, but insists on whining first.)

I've hit that too, and am big into Not Doing That. Thought I'd blogged about it,
but it could have been irc, or twitter (which I deleted when twitler bought it
but have an archive I should probably post somewhere), or... probably too old
for mastodon?

There's a reason I get so exasperated about each new gnu/nag I stub my toe on.
It's gone beyond isolated incident into "pattern of looking down on everyone
else and sneering".

Unix has always been a silent protagonist, without which shell scripts are a
pain to do. If it doesn't work, they'll figure out why. Just behave consistently
(according to SOME kind of understandable logic) and let them keep the pieces.
Sometimes there's a -v flag to activate printfs() stuck into the code, but don't
express opinions when they didn't ASK. (Put them in the man page or --help if
it's that important.)

This has ALWAYS been the unix way. There are ALWAYS corner cases, and
deterministic behavior is not difficult to debug. The gnu/FSF never got that.
Stallman only decamped to unix under protest, a refugee from the Jupiter
project's collapse orphaning ITS, and he never really understood it.

RMS did not INVENT the idea of cloning unix with his big announcement in 1983.
Unix was a diverse community starting from the 1974 ACM article, let alone the
Berkeley Software Distribution in 1975. The first full from-scratch Unix clone
(writing their own kernel, compiler, and command line) was Coherent, which
shipped in 1980. Paul Allen copied subdirectories and file descriptors from unix
into DOS 2.0 not long after. Minix started in 1983 and shipped in 1986, and
Linux is 100% a descendant of Minix (developed on minix, its first filesystem
was minix, the development discussion on comp.os.minix, he inherited 80% of the
minix community because he took patches and Tanenbaum didn't...) There's a
famous tanenbaum-torvalds debate preserved for posterity, there is NOT a
stallman-torvalds debate because nobody cared what stallman had to say.

Nor did he invent freeware, which was the universal norm before the Apple vs
Franklin decision in 1983 because you couldn't copyright binaries before Steve
Jobs got the appeals court to change the law. Byte and Compute magazines had
basic listings in the back of each issue for you to type in, decus and CP/M
northwest had software libraries, the commodore 64 came bundled with a disk of
Jim Butterfield's software but he didn't WORK for them: he founded the Toronto
Pet User's Group (TPUG) and published free software with source code.

But Stallman mansplained at everyone else at the top of his lungs nonstop from
the moment he showed up, and there are all sorts of topics that can't NOT have
an "as opposed to what stallman's saying, the truth is" section today...

  https://en.wikipedia.org/wiki/Freeware

Sigh, watching https://youtu.be/2gOGHdZDmEk and https://youtu.be/WWfsz5R6irs and
https://youtu.be/9RO5ZAmzjvI every time the narration talks about Pierre Spray I
get Stallman vibes. There's a broadcast version of Dunning-Kruger where you
plausibly preach to an audience who doesn't know better, and become The Expert
that everybody must get a quote from every time something happens in that area,
while the people actually doing the work facepalm at every third word.

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] uname no longer broken on FreeBSD?

2024-04-13 Thread Rob Landley
On 4/13/24 03:00, Vidar Karlsen via Toybox wrote:
> Hello,
> 
> toys/posix/uname.c builds and runs on Freebsd now. I have tested it on
> 13.2-amd64, 14.0-amd64, 13.2-arm64 and 14.0-arm64. I think it's safe to
> put CONFIG_UNAME=y back into kconfig/freebsd_miniconfig.

Ah, commit d2bada0e42e6 fixed it but I only remembered to add it to
macos_defconfig, forgot the other one.

Thanks,

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] prereq build, what is the motivation behind building od?

2024-04-12 Thread Rob Landley
On 4/8/24 14:20, Oliver Webb via Toybox wrote:
> Although I may be wrong, "od" doesn’t seem to be in 
> the build infrastructure. What’s the reason for it being a
> "prereq" command.

$ vi scripts/recreate-prereq.sh
...
$ grep '^od ' log.txt
od "-Anone" "-vtx1"

https://github.com/landley/toybox/blob/0.8.11/scripts/make.sh#L230

> Also, have you thought about specifying FILES through
> the command line to reduce build time by only building what we need to.

Have I thought about micromanaging the build in a way that may not link in
combination with a given set of generated/*.h files? Probably at some point.

Keep in mind I've been doing this stuff on and off since... depending on how you
want to look at it, 1999.

> Scanning
> for commands with “which”

"which" looks at what's installed on the host out of the $PATH. what does that
have to do with what's configured in toybox? (If I supplied an airlock I
specified the $PATH...)

> and maybe uname for stuff like gsed

You mean use uname to figure out if we're running on MacOS or FreeBSD like the
code already does in scripts/portability.sh?

> and putting them
> in FILES if we don’t have a good enough version.

I built defconfig under record-commands, and then did the standard "awk '{print
$1}' | sort -u | xargs" trick from literally _decades_ ago:

https://github.com/landley/aboriginal/blob/dbd0349d8ae6/sources/toys/report_recorded_commands.sh#L10

https://landley.net/aboriginal/FAQ.html#:~:text=logging%20wrapper

to get a list of the commands used by that, and used that to generate a toybox
.config file enabling those commands.

I then made a SHELL SCRIPT that DID ALL THAT so you could SEE HOW/WHY IT WAS
BUILT (and also so I could automate updating it, yes I should probably add it to
release.txt):

https://github.com/landley/toybox/blob/master/scripts/recreate-prereq.sh

And tried to explain that I'd done so:

https://github.com/landley/toybox/commit/d1acc6e88be5

And how to use the result:

https://github.com/landley/toybox/commit/3bbc31c78b41

> Then generating generated/
> files based off of that?

No.

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] timeout.test: reduce flake.

2024-04-12 Thread Rob Landley
Catching up. (I let stuff pile up preparing for the release and then took a
couple days off, and now I'm at texas linuxfest doing sleep deprived talk prep
for tomorrow...)

On 4/8/24 15:28, enh via Toybox wrote:
> A (presumably overloaded) CI server saw the `exit 0` test time out.
> Given that several of these tests should just fail immediately,
> having a huge timeout isn't even a bad thing --- if we had a bug
> that caused us to report the correct status, but not until the
> timeout had _also_ expired, this would make that failure glaringly
> obvious.
> 
> Aren't the other tests with 0.1s timeouts potentially flaky? Yes,
> obviously, but I'll worry about those if/when we see them in real
> life? (Because increasing those timeouts _would_ increase overall
> test time.)

Yes it should never happen, but 11 minutes seems like a footgun.

I bumped it up to 1 second (10 times as long as before). If you see it again I
can bump it to 5 seconds, but much beyond 1 second and the "timeout -v .1 sleep
3" test later on gets flaky, as does:

toyonly testcmd "-i" \
  "-i 1 sh -c 'for i in .25 .50 2; do sleep \$i; echo hello; done'" \
  "hello\nhello\n" "" ""

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


[Toybox] Release 0.8.11 is out.

2024-04-08 Thread Rob Landley
Yeah, a bit overdue. Lemme know if anything in the release notes isn't clear.

Still doing a texas linuxfest talk soonish. Hopefully they post a video
eventually...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] tail test failures?

2024-04-08 Thread Rob Landley
On 4/8/24 11:14, enh via Toybox wrote:
> looks like the github CI has been red for ubuntu and macOS since april 5th?
> 
> this revert fixes the current failing test:
> 
> [master 8368f8f9] Revert "Enforce min/max for % input type (time in
> seconds w/millisecond granularity)."
> 
> but that just gets me a different failing test, so it's obviously a
> bit more subtle than that :-)

Darn it, didn't get a release out on leap day, didn't get a release out during
the eclipse... Always one more thing.

(Pay no attention to the binaries I just uploaded, gotta rebuild them and do it
again. This is why I push the tag and update the news.html file on the website
LAST...)

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] utf8towc(), stop being defective on null bytes

2024-04-08 Thread Rob Landley
On 4/8/24 11:53, Oliver Webb wrote:
> Still, U+ is a valid code point, and having a special case especially for 
> it
> that isn’t mentioned but you have to watch out for is either a bug or a
> documentation error.

I say it's intentional, you reassert that I'm wrong.

I'll leave you to your opinion...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] utf8towc(), stop being defective on null bytes

2024-04-08 Thread Rob Landley
On 4/8/24 11:01, enh wrote:
>> > Returning length 0 means we hit a null terminator,
>>
>> Null bytes aren't always "terminators". You can embed null bytes into data 
>> and still
>> want to do utf8 processing with it.
> 
> that's questionable ... the desire to have ASCII NUL in utf-8
> sequences (without breaking the "utf-8 sequences are usable as c
> strings" property) is the main reason for the existence of "modified
> utf-8".

You don't need a conversion function to grab a nul byte, you can check if it's a
null byte.

That value _is_ a special case, the enclosing loop can deal with it easily
enough (there's nothing to convert, it's a NUL byte, check directly). I've got
functions like regexec0() that work over a range instead of using a NUL, and
those have to deal with libc's regex stopping at NUL so the enclosing loop
advances past it and restarts.

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] utf8towc(), stop being defective on null bytes

2024-04-07 Thread Rob Landley
On 4/6/24 17:48, Oliver Webb via Toybox wrote:
> Heya, looking more at the utf8 code in toybox. The first thing I spotted is 
> that
> utf8towc() and wctoutf8() are both in lib.c instead of utf8.c, why haven't 
> they
> been moved yet, is it easier to track code that way?
The "yet" seems a bit presumptuous and accusatory, but given the title of the
post I suppose that's a given.

I have no current plans to move xstrtol() from lib.c to xwrap() And atolx() is
only called that instead of xatol() because it does suffixes.

The reason it had to go in lib.c back in the day was explained in the commit
that moved it to lib.c:

  https://github.com/landley/toybox/commit/6e766936396e

As for moving it again someday, unnecessarily moving files is churn that makes
the history harder to see, and lib/*.c has never been a strict division (more
"one giant file seems a bit much"). The basic conversion to/from utf8 is
different from caring about the characteristics of unicode code points (which
the rest of utf8.c does), so having it in lib.c makes a certain amount of sense,
and I'm not strongly motivated to change it without a good reason.

It might happen eventually because I'm still not happy with the general unicode
handling design "yet", but that's a larger story.

Way back when there was "interestingtimes.c" for all my not-curses code, but it
was too long to type and mixed together a couple different kinds of things, so I
split it into utf8.c and tty.c both of which were shorter and didn't screw up
"ls" columnization. (I probably should have called it unicode.c instead, but
unicode is icky, the name is longer, and half the unicode stuff is still in libc
anyway).

Unicode is icky because utf8 and unicode are not the same thing. Ken Thompson
came up with a very elegant utf8 design and microsoft crapped all over it (cap
the conversion range, don't add the base value covered by the previous range so
there are no duplicate encodings) for no apparent reason, and then unicode just
plain got nuts. (You had an ENORMOUS encoding space, the bottom bit could
totally have been combining vs physical characters so we don't need a function
to tell, and combining characters should 100% have gone BEFORE the physical
characters rather than after to avoid the whole problem of FLUSHING them, and
higher bits could indicate 1 column vs 2 column or upper/lower/numeric so you
don't have to test with special functions like that, just collage them into
LARGE BLOCKS which is LESS SILLY than the whole "skipping 0xd800" or whatever
that is for the legacy 16 bit windows encoding space that microsoft CRAPPED INTO
THE STANDARD... Ahem.)

But alas, microsoft bought control of the unicode committee, so you need
functions to say what each character is, and those functions are unnecessarily
complicated. In theory libc has code to do wide char conversions already, but
glibc refuses to enable it unless you've installed and selected a utf8-aware
locale (which is just nuts, but that's glibc for you).

I made some clean dependency-free functions to do the simple stuff that doesn't
care what locale you're in, but there's still wcwidth() and friends that depend
on libc's whims (hence the dance to try to find a utf8 locale in main.c, and the
repeated discussion on this list between me and Elliott and Rich Felker about
trying to come up with portable fontmetrics code. Well, column-metrics. Elliott
keeps trying to dissuade me, but bionic's code for this still didn't work static
linked last I checked...)

Moving stuff around between files when I'm not entirely satisfied with the
design (partly depending on libc state and partly _not_ depending on it) doesn't
seem helpful.

> Also, the documentation
> (header comment) should probably mention that they store stuff as unicode 
> codepoints,

Because I consistently attach comments before the function _body_ explaining
what the function does, instead of putting long explanations in the .h files
included from every other file which the compiler would have to churn through
repeatedly. In this case:

  // Convert utf8 sequence to a unicode wide character
  // returns bytes consumed, or -1 if err, or -2 if need more data.
  int utf8towc(unsigned *wc, char *str, unsigned len)

> I spent a while scratching my head at the fact wide characters are 4 byte 
> int's
> when the maximum utf8 single character length is 6 bytes.

Because Microsoft broke utf8 in multiple ways through the unicode consortium,
among other things making 4 bytes the max:

http://lists.landley.net/pipermail/toybox-landley.net/2017-September/017184.html

In addition to the mailing list threads, I thought I blogged about this rather a
lot at the time:

  https://landley.net/notes-2017.html#29-08-2017
  https://landley.net/notes-2017.html#01-09-2017
  https://landley.net/notes-2017.html#19-10-2017

Which was contemporaneous with the above git commit that added the function to
lib/lib.c. I generally find that stuff by going "when did this code show up
and/or get 

[Toybox] scripts/prereq/build.sh

2024-04-05 Thread Rob Landley
I recently added scripts/prereq/build.sh which runs a "cc -I dir *.c" style
build against canned headers. Theoretically a portable build not requiring a
system to have any command line utilities except "cc" and a shell. (Ok, you
still need bash to run scripts/make.sh and scripts/install.sh until toysh is
promoted. And until I replace kconfig, you still need gmake to run "make
defconfig", but I've got a design for that one now.)

Both that build.sh script and the saved scripts/prereq/generated headers are
created by scripts/recreate-prereq.sh which figures out what commands a toybox
build uses out of the $PATH (by doing a defconfig build under
mkroot/record-commands.sh), makes a .config file with just those commands
enabled and all dependencies switched off (and hardwires the two not-android
not-mmu symbols that get compiler probed), then strips down the resulting
headers to have just the symbols those commands need. (Well, I haven't stripped
down config.h yet but all the OTHERS are hit with sed/grep to remove stuff for
the commands that aren't enabled.)

Of course when I ran it on macos it went "boing":

toys/other/taskset.c:52:17: error: use of undeclared identifier
'__NR_sched_getaffinity'
toys/other/taskset.c:81:15: error: use of undeclared identifier
'__NR_sched_setaffinity'
toys/other/taskset.c:119:29: error: use of undeclared identifier
'__NR_sched_getaffinity'
3 warnings and 3 errors generated.

It's trying to build nproc, which scripts/make.sh uses out of the $PATH to query
available processors. And yes, nproc calls sched_getaffinity() on linux (even
the debian one, according to strace) which isn't really portable...

In theory, I've got some workaround code for nproc being unavailable in
scripts/portability.sh already:

# Probe number of available processors, and add one.
: ${CPUS:=$(($(nproc 2>/dev/null || sysctl -n hw.ncpu 2>/dev/null)+1))}

I'm uncomfortable leaning in to "linux else bsd/mac" because I was also thinking
about stuff like qnx and vxworks and so on with the new "canned" build, but if
all the probes fail that becomes CPUS=$((+1)) and thus sets it to 1, which
should still work if I filter out nproc and sysctl isn't there either?

But I'd also like to build nproc for other targets if I could. Which sounds like
it turns into a portability.c mess pretty quickly...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


[Toybox] Mkroot talk at texas linuxfest on the 13th.

2024-04-03 Thread Rob Landley
They posted the description. It's basically "45 minutes about mkroot":

https://2024.texaslinuxfest.org/talks/mkroot-tiny-linux-system-builder/

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] more gnu nonsense: cp -n

2024-04-03 Thread Rob Landley
On 4/2/24 01:35, Ryan Prichard wrote:
> Apparently upstream coreutils "cp -n" changed between 9.1 and 9.2, and the
> Debian maintainers reverted the change temporarily(?) and also added the
> "non-portable" error.
> 
> In coreutils 9.1 and older, "cp -n" quietly skipped a file if the
> destination existed, but as of 9.2, it instead prints an error and exits with
> non-zero at the end. (I saw some stuff about "immediately failing" on the 
> Debian
> bug, but AFAICT, cp keeps going and fails at the end.) It does look like the 
> new
> 9.2+ behavior matches "cp -n" on macOS (14.3.1) (and probably FreeBSD but I
> didn't test that).

In toybox, I tend to repeat an option to get that sort of behavior, so I'd do:

  cp -n thingy... - skip files, no error
  cp -nn thingy... - skip files, with error

That way the existing behavior doesn't change, and old versions that don't
understand the doubling still provide the old behavior (because cp -n -n = cp -n
by default) without erroring out on an unknown flag or consuming more namespace.

See toybox's "ls -ll" (shows nanoseconds) or "lsusb -nn" (numeric AND
non-numeric output) for examples. And yes, debian handles "ls -ll" just fine. :)

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] more gnu nonsense: cp -n

2024-04-01 Thread Rob Landley
On 4/1/24 10:31, enh via Toybox wrote:
> hadn't seen this one before...
> 
> cp: warning: behavior of -n is non-portable and may change in future;
> use --update=none instead
> 
> (consider me skeptical that a system without -n is going to have
> --update=none...)

Define non-portable? Freebsd 14 has -n, macos has -n, busybox cp has -n, and of
course toybox (and thus android) has -n.

Meanwhile:

$ ./busybox cp --update=none one two
cp: option '--update' doesn't allow an argument
root@freebsd:~ # cp --update=none one two
cp: illegal option -- -
root@freebsd:~ # cp --update=none one two
cp: illegal option -- -
$ toybox cp --update=none one two
cp: Unknown option 'update=none' (see "cp --help")

Those clowns are explicitly advocating for a LESS portable option.

This is why I'm not removing "egrep", which is a shell wrapper on my devuan
system by the way:

$ which egrep
/bin/egrep
$ cat /bin/egrep
#!/bin/sh
exec grep -E "$@"

At least THAT one is easy for distributions to keep doing regardless of 
gnu/stupid.

If the solution for cp -n isn't "distro patches out the stupid", then "install
busybox cp" or just "use alpine". Spurious warnings from gnu are just that:
spurious.

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] Microsoft github took down the xz repo.

2024-03-30 Thread Rob Landley
On 3/30/24 15:16, Oliver Webb wrote:
> On Saturday, March 30th, 2024 at 15:06, Rob Landley  wrote:
>> FYI, Microsoft Github disabled the xz repository because it became
>> "controversial" (I.E. there was an exploit in the news).
>> 
>> https://social.coop/@eb/112182149429056593
>> 
>> https://github.com/tukaani-project/xz
> 
> They couldn't have removed commit access for the trojan horse and got on with 
> their lives?

Mastodon's been talking about this at length all day:

  https://mstdn.social/@rysiek/112184610302366603
  https://hachyderm.io/@dalias/112182128889536710
  https://cyberplace.social/@GossiTheDog/112184645230558304
  https://social.secret-wg.org/@julf/112184194797977290
  https://mastodon.social/@richlv/112180479433832095

And a lot of things the discussion was linking to went away. Oh well...

>> I'm assuming if toybox ever has a significant bug, microsoft would respond by
>> deleting the toybox repository. There's a reason that I have
>> https://landley.net/toybox/git on my website, and my send.sh script pushes to
>> that before pushing to microsoft github.
> 
> As much as it doesn't matter, I've wondered what git web frontend you use, 
> The html source for
> the massive table of commits doesn't give a copyright notice.

https://github.com/landley/toybox/blob/master/scripts/git-static-index.sh

https://landley.net/notes-2022.html#22-12-2022

> Do you just have a script make
> a table out of "git log"? Furthermore, have you considered using cgit or 
> gitea or another
> fancier git frontend for your own site?

I engaged with cgit at one point and found it overcomplicated and unpleasant.

I set up gitea for Jeff on a j-core internal server, and it was fine except it
used a BUNCH of memory and cpu for very vew users. Running cgi on dreamhost's
servers is a bother at the best of times (I don't want to worry about exploits),
and the available memory/CPU there is wind-up toy levels.

My website is a bunch of static pages rsynced into place, some of which use
xbithack to enable a crude #include syntax, and that's about what the server can
handle.

> There is also the issue of you not being able to push commits to the github 
> repo because
> github is forcing everyone to use 2FA.

I haven't been hit by that yet for some reason. I push from the command line
anyway (which is basically ssh), so if I lost website access I could presumably
still update the README to let people know where to go.

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] Microsoft github took down the xz repo.

2024-03-30 Thread Rob Landley
On 3/30/24 15:11, Rob Landley wrote:
> upstream of the xz-embedded repo with the public domain code I cloned is:
> 
>   https://git.tukaani.org/xz-embedded.git
> 
> Which is still available.

Although now that I look at it, a5390fd368f8 in september is the last commit
that wasn't from the backdoor guy anyway, so nothing new of interest.

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


[Toybox] Microsoft github took down the xz repo.

2024-03-30 Thread Rob Landley
FYI, Microsoft Github disabled the xz repository because it became
"controversial" (I.E. there was an exploit in the news).

  https://social.coop/@eb/112182149429056593

  https://github.com/tukaani-project/xz

I'm assuming if toybox ever has a significant bug, microsoft would respond by
deleting the toybox repository. There's a reason that I have
https://landley.net/toybox/git on my website, and my send.sh script pushes to
that _before_ pushing to microsoft github.

Luckily the xz guys don't seem to trust microsoft github either, because the
upstream of the xz-embedded repo with the public domain code I cloned is:

  https://git.tukaani.org/xz-embedded.git

Which is still available.

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] Clean up xz a good amount

2024-03-29 Thread Rob Landley
On 3/29/24 17:50, Oliver Webb wrote:
>> > ah, crap, that's another thing to put on the riscv64 to-do list...
>> > (thanks for bringing that to light!)
>> 
>> so, TIL that upstream already added a risc-v bcj implementation...
> 
> I always thought that the xz decompresser we use in toybox ("xx-embeded") and 
> the main
> one (The one with the CVE) were different projects (Separate git repos, one 
> is much slower
> than the other, etc).

The exploit was somebody checked a "test case" into the build system that hacked
the rest of the build with an x86-64 binary blob that linked before the other
functions?

https://youtu.be/jqjtNDtbDNI

I was only halfway paying attention once I was sure it didn't affect toybox. My
systems here use dropbear for ssh anyway, yes including my laptop. :)

> That being said, There are 0BSD licensed parts in the xz repo
> (one of SIX different licenses).

Huh, really? Cool...

>> (rob will of course be delighted to hear of systemd's involvement in
>> the exploit chain :-) )
> 
> Who would've known that a over-complicated, extremely large hairball with a 
> massive dependency chain
> that tries to consume _everything_ makes it easy to perform exploits.

Deleted long grumbling about adding complexity probably means you're _reducing_
security because the system is less auditable: a signing chain of custody is
still GIGO it just means it was delivered to you by TIVO with a mandatory EULA
so you can't personally FIX it...

Ahem. Tangent. Not going there.

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] Clean up xz a good amount

2024-03-29 Thread Rob Landley
On 3/29/24 17:28, enh wrote:
> On Wed, Feb 28, 2024 at 9:13 AM enh  wrote:
>> > > @@ -639,6 +640,20 @@ enum xz_ret xz_dec_bcj_run(struct xz_dec_bcj *s, 
>> > > struct xz_dec_lzma2 *lzma2,
>> > >   */
>> > >  enum xz_ret xz_dec_bcj_reset(struct xz_dec_bcj *s, char id)
>> > >  {
>> > > +  switch (id) {
>> > > +  case BCJ_X86:
>> > > +  case BCJ_POWERPC:
>> > > +  case BCJ_IA64:
>> > > +  case BCJ_ARM:
>> > > +  case BCJ_ARMTHUMB:
>> > > +  case BCJ_SPARC:
>> > > +break;
>> > > +
>> > > +  default:
>> > > +/* Unsupported Filter ID */
>> > > +return XZ_OPTIONS_ERROR;
>> > > +  }
>> > > +
>> > >s->type = id;
>> > >s->ret = XZ_OK;
>> > >s->pos = 0;
>>
>> ah, crap, that's another thing to put on the riscv64 to-do list...
>> (thanks for bringing that to light!)
> 
> so, TIL that upstream already added a risc-v bcj implementation...

I'm happy to call the public domain repo our "upstream" for this, but there's
still some collation damage (they have many files and we want either one or
two), and a lot of cleanup that could be done in our code that moves it farther
from their code.

As for whether we want one file or two: one model is the engine in the command
ala toys/*/bzcat.c and the other is lib/deflate.c called by toys/*/gzip.c (but
also available for other things to pull in without having to fork a child
process and pipe data through it). But the real difference there is deflate has
half an inflate already that I REALLY SHOULD FINISH (dictionary selection and
resets, everything else is just a question of doing the work) and xz compression
seems a bit out of scope. (Being able to read everything: yay. Being able to
compress data, gzip is the 80/20.

Modulo busybox refuses to build without bzip2 compression (I hit it until it
confessed in mkroot/packages/busybox.c but that broke all the help text), and I
did WRITE a cleaned up bzip2 engine many moons ago (reposted it here not to long
ago I think), so I _could_ have a lib/bzip2.c with a compression side if I
wanted to? Modulo the bzip2 compression side string sort logic never made sense
to me (what is the logic of falling back from one sort mechanism to the next,
why those in that order with those thresholds) so to test my engine I had to
block copy the original sort logic, which has licensing issues...

> ...but i only learned that because i was looking at
> https://www.openwall.com/lists/oss-security/2024/03/29/4 which was
> fascinating in many ways.
> 
> (rob will of course be delighted to hear of systemd's involvement in
> the exploit chain :-) )

I saw a youtube video on it, and it's been all over mastodon today. So much
unnecessary complexity. Adding layers to "solve" problems is painting over dry
rot. There are reasons I also want to simplify the build system itself, and care
so much about comparing the behavior across multiple platforms...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] Poke about the bc.c cleanup patches I submitted a while ago

2024-03-28 Thread Rob Landley
On 3/27/24 08:31, Rob Landley wrote:
>>> ipcrm, ipcs,
...
>> I don't know how I'm supposed to test resources I have no way to create,
>> we'll need ipcmk eventually. These seem more feasible to test, although
>> their tests will fail under mkroot until we
>> have ipcmk
...
> To be honest, I'm tempted to clean up and promote them to "examples". Leaving
> them "default n". There in case somebody needs it, but if so it would be nice 
> if
> they could send us a note letting us know they exist...

I did a quick cleanup pass on ipcrm, but... yeah, I have no idea how to test 
this?

Also... what ARE keys vs IDs? I thought ID was a number and a key would be
arbitrary strings, because a key gets washed through a lookup function and an id
is just strtol(), but the code that's there does:

function(int key, char *name...)
{
...
  id = strtol(name, , 0);
  if (*c) {
error_msg("invalid number :%s", name);
return;
  }

  if (key) {
if (id == IPC_PRIVATE) {
  error_msg("illegal key (%s)", name);
  return;

IPC_PRIVATE is zero. So even if you set "key" to 1, strtol() has to consume the
whole thing or you get "invalid number" error and an abort before it even checks
key. There's no !key test around that first bit. And then right afterwards it
checks if the strtol() it did returned zero (IPC_PRIVATE is zero) and barfs if
it did, so even if that first part WAS a thinko with a missing test, it still
wouldn't work for anything that didn't at least START with a nonzero number.

So what's a "key"?

I did a "git log */ipcrm.c" over in busybox and there hasn't been a patch to it
from an actual USER of this command since it was introduced.

It's all code size shrink, compiler flag damage, white space fixes, help text
style updates, annotating with size estimates, NOEXEC, "make GNU licensing
statement forms more regular", "use can't instead of cannot", using EXIT_SUCCESS
and EXIT_FAILURE macros (really???), whatever "strtoul() fixes" was, and so on.
Churn for being a busybox applet, global search and replace over the tree.

No actual _user_ of the code has touched it since it was added to the tree, and
it turns out that was MY fault:

  commit 6eb1e416743c597f8ecd3b595ddb00d3aa42c1f4
  Author: Rob Landley 
  Date:   Mon Jun 20 04:30:36 2005 +

Rodney Radford submitted ipcs and ipcrm (system V IPC stuff).  They could
use some more work to shrink them down.

And in my defense, I had no idea what they WERE back then. That whole mess
started with a poke from some Qualcomm developers from India:

  http://lists.busybox.net/pipermail/busybox/2005-June/048807.html

Which led to a newbie looking for something to do asking how you submit new
commands to the project:

  http://lists.busybox.net/pipermail/busybox/2005-June/048828.html

And then two other devs piping up to show interest:

  http://lists.busybox.net/pipermail/busybox/2005-June/048847.html
  http://lists.busybox.net/pipermail/busybox/2005-June/048848.html

Which led to the patch.

So three people showed interest in 2005, resulting in a new dev porting the
commands from util-linux-2.12a, but none of them actually submitted anything
like a test case:

  http://lists.busybox.net/pipermail/busybox/2005-June/048851.html

So I have what I think is a cleaned up version but can't prove I didn't break
it, and I have no idea if 19 years after it was added to busybox and then (as
far as I can tell) completely ignored... anyone still cares?

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] modeprobe.c and last.c: Codeshare identical llist_add()

2024-03-27 Thread Rob Landley
On 3/26/24 15:05, Oliver Webb via Toybox wrote:
> 2 identical versions of the same function, variable names and everything
> 
> 31 bytes saved in bloatcheck

The problem being it moves code from pending/ to lib/ whose only users are in
pending.

I've generally just done singly linked list additions inline. When you don't
mind reversing the list order it's literally two assignments and a dereference;

  node->next = head;
  head = node;

Pushing two arguments onto the stack and making a function call is approximately
as much code. (When I want to preserve list order I tend to use the existing
doubly linked list functions.)

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] toysh: fix -Wuse-after-free

2024-03-27 Thread Rob Landley
On 3/25/24 20:24, enh wrote:
> But "dpkg-query -S $(which $NAME)" is pretty easy to do the mapping 
> yourself on
> debian...
> 
> 
> (yeah, though i suspect anyone trying to do this hypothetical "swap package $X
> for toybox" would want the _opposite_ mapping, from package name to all the
> commands. and i don't know of a way to ask apt that question?

  $ dpkg-query -L tar | grep bin/
  /bin/tar
  /usr/sbin/rmt-tar
  /usr/sbin/tarcat

> other than
> brute-forcing all of the executables in all of the directories in $PATH, 
> anyway.)

Checking the $PATH would be clever but the above covers it for me.

There are some insane packages which crap binaries under /usr/lib, such as
/usr/lib/libreoffice/program/oosplash or /usr/lib/man-db/manconv and generally I
consider these packages to be maintained by madmen.

I mean honestly:

  $ cat /usr/bin/7z
  #! /bin/sh
  exec /usr/lib/p7zip/7z "$@"

Why would you do that? Why would ANYONE voluntarily DO that?

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] toysh: fix -Wuse-after-free

2024-03-27 Thread Rob Landley
On 3/25/24 20:20, enh wrote:
> On Sun, Mar 24, 2024 at 12:45 AM Rob Landley  wrote:
> On 3/22/24 10:24, enh wrote:
> > On Thu, Mar 21, 2024 at 8:45 PM Rob Landley  wrote:
> >> Anyway, toys/android basically meant (to me), "commands that come from
> and are
> >> maintained by Elliott which I can't even test because they don't apply 
> to a
> >> vanilla linux system that isn't running the full android environment".
> Although
> >> that's a personally idiosyncratic definition because I lumped selinux 
> in with
> >> that;
> >
> > (heh. you beat me to it :-) )
> 
> If the new kconfig greyed out unavailable entries and had a status line 
> saying
> "depends on TOYBOX_ON_ANDROID" or similar when you cursored over a greyed 
> out
> entry...
> 
> ah, as the kind of lunatic who only ever edits these files by hand with vi, 
> i'd
> actually just assumed that was kind of the whole point of the _existing_ 
> kconfig
> stuff?

To me half the point is it's the same UI as configuring the linux kernel,
busybox, and buildroot. Meaning A) a bunch of people out there are familiar with
it already, B) presumably the worst sharp edges have been filed off over the
past 15 years.

> (to be fair,  i did launch it once, but saw it was a ridiculously deeply 
> nested
> ui [and not expanded by default?], and thought "i don't understand the purpose
> of this", couldn't see how to search,

It literally has help text at the top of the screen.

Forward slash is search, cursor up and down, space to toggle the highlighted
thingy, enter to go into a menu, ESC to back out again, ESC from the top level
to exit (it prompts you whether or not you want to save), ESC twice from _that_
to abort the exit.

There's also a menu at the bottom, where if you cursor left and right it
highlights different things, and the ENTER will do that thing instead. (The
default is "select". I cursor right to "help" and hit enter because I never
remember that ? is the hotkey for that.)

Mostly I'm assuming "same UI as linux kernel" is like 2/3 of the userbase 
though.

> and immediately went back to editing by
> hand. at least that way i only need to know how to use my editor, which i need
> to know regardless :-) )

Dependency resolution comes to mind.
> If we really wanted to rush this, I could make a TOYBOX_UNFINISHED symbol 
> that
> the pending stuff could depend on, and then the blocker is the kconfig
> replacement...
> 
> no, i've been cursing the broken tab-complete for -- wow, almost a decade now!
> -- so i think i can survive :-)

I admit I sometimes do "ls toys/*/skel* when I can't remember whether I called
it "example" or "examples".

> Not THIS release though. Working on release notes! (And lowering my 
> standards on
> the todo list.)
> 
> indeed... something that benefits the handful of folks working on toybox isn't
> worth much compared to something that benefits the users!

Working on it...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] toysh: fix -Wuse-after-free

2024-03-27 Thread Rob Landley
On 3/24/24 10:15, Oliver Webb wrote:
> On Sunday, March 24th, 2024 at 04:09, Rob Landley  wrote:
>> On 3/24/24 01:00, Oliver Webb wrote:
> 
>> This isn't the hard part. To me, the hard part is wanting to share lib/.c 
>> code
>> with this new binary, which implies it would live in toys/example/.c, which
>> means in the NEW design it would be a normal command that's "default n"... 
>> and
>> maybe depends on TOYBOX_BUILD or some such? Except moving stuff from 
>> scripts/.c
>> to toys/.c is conceptually ugly. But if we're getting rid of the
>> subdirectories... Maybe make.sh needs to be able to build commands that DON'T
>> live in toys/ but then...
> 
> There is a chicken and egg problem with the build infrastructure and kconfig 
> being a toy,

Yes, I know. That's why I've avoided it up until now.

> We need a .config file to build toys, and parsing the help text requires some 
> kconfig
> parser, But we can't make a .config file until we have kconfig.

You don't need a .config file to build lib/*.c (policy, and why lib/lib.h is
separate from toys.h).

I'm talked before about doing packaged minimal headers to build "sed" and "sh"
standalone, as part of toybox airlock stuff. (Possibly the full airlock command
list needed to build toybox.) Ones that assume all the config probes failed and
$LIBRARIES is empty and so on.

The EASY way to do that is to have a scripts/shipped/generated with handcrafted
headers files, and then stick -I scripts/shipped at the start of $CFLAGS.

The hard way involves more cleanup so there are fewer header entry points, and I
could have a single "hairball.h".

The individual toys/*/*.c files only #include toys.h and generated/flags.h
(which could be a re-include of toys.h with a little #ifdef cleverness). The
rest are all included from toys.h and main.c.

The above #ifdef cleverness could wrap the generated/ includes in toys.h in a
__has_include("hairball.h") or similar, so I could provide a single replacement
file with the collated stuff I need for specific commands to build in a way that
assumes the host system has no brain, and then generated/build.sh it. The
problem is the includes are in two places: "generated/config.h" comes before
lib/portability.h (which comes before everything else so it can override
standard header #includes), and then the rest of the generated/*.h are after a
#define NEWTOY() and OLDTOY() so there's some reordering to do to combine them
all into one header.

(I don't think any of the generated/*.h files care about stuff in portability.h?
There would be various structs used before they were defined in
generated/globals.h if it got moved up, but that _should_ be ok? Nothing takes
the sizeof() them or similar that early. Eh, I should be able to work it out,
just haven't sat down to try yet. Far too many already open cans of worms...)

And then there's the #includes in main.c, the other half of the
declaration/definition pair for various global data: that needs newtoys.h,
help.h, and zhelp.h. One of which is already chopped out by a config option, the
second of which just needs a way to stub it to "", and which leaves newtoy.h to
address...

> The solution I thought of was to use the infrastructure that we will have to 
> have to remove
> bash and gsed dependencies to build kconfig as a early step in the process.

No.

>  But then we will
> still need to extract the help text.

config TOYBOX_HELP
bool "Help messages"
default y
help
  Include help text for each command.

You can configure help out entirely. (This isn't CONFIG_HELP the command, this
is "the help subsystem" in the toybox general settings menu.) This was 
intentional.

> Do you plan on not keeping 2 different kconfig parsers or moving scripts/*.c 
> to toys/example

Look up at the first paragraph of mine you quoted in this email.

It's an open question, but stripping down a "cc -I scripts/prebuilts main.c
lib/*.c toys/*/{abc,def,ghi,jkl}.c"  build so it could provide commands with
nothing but a compiler would be a step towards that.

Modulo that "cc *.c" doesn't parallelize across processors because C++
developers took over compiler development about 2 years after the Core Duo hit
the market and brought SMP to the cheap retail mainstream, at which point making
compilers better rather than merely more complicated hit a sudden brick wall.

And thus even on my ancient 4x laptop:

$ time make clean defconfig toybox
...
real0m16.170s
$ time generated/build.sh
...
real0m27.474s

I don't want to significatly slow down the build by compiling prerequisites? In
theory:

$ time gcc -I . main.c lib/*.c -o blah
...
real0m1.780s

(Yeah exits with a link error but that's not the point.)

And I mean yeah, 2 seconds, not that big a deal. But I'd pr

Re: [Toybox] hexdump tests.

2024-03-27 Thread Rob Landley
On 3/25/24 10:42, enh wrote:
> On Sun, Mar 24, 2024 at 1:40 AM Rob Landley  wrote:
>>
>> On 3/22/24 15:02, enh wrote:
>> >> > CANONICALIZE_SPACE_IF_RUNNING_HOST_VERSION=1? so we trust ourselves but 
>> >> > no-one
>> >> > else? :-)
>> >>
>> >> I _don't_ trust myself, and I'm not special. (That's policy.)
>> >
>> > yeah, but that's why i suggested
>> > CANONICALIZE_SPACE_IF_RUNNING_HOST_VERSION --- that way we can say "we
>> > can't make hard assertions about the _host's_ whitespace, but we can
>> > still make hard assertions about _ours_". if we just canonicalize all
>> > the whitespace all the time, we can't (say) ensure that columns line
>> > up or whatever.
>>
>> Or we could just "NOSPACE=1 TEST_HOST=1 make tests" if that's the test we 
>> want
>> to run...?
> 
> it's not though. that's my point. there are several cases:
> 
> 1. testing toybox --- we know what whitespace we're expecting to
> produce, and want tests to protect against regressions.
> 
> 2. testing host tools --- we _don't_ have control over what whitespace
> the host produces.
>   a) in some cases we manually mark individual tests to show "we don't
> care about host whitespace for this test case".
>   b) sometimes this applies to _all_ the tests for a toy.
> 
> we're talking about case 2b here, which is currently the
> least-well-supported variant.

You can NOSPACE=1 in an individual tests/command.test and it should last until
the end of the file? That's why scripts/test.sh does:

  # Run command.test in a subshell
  (. "$1"; cd "$TESTDIR"; echo "$FAILCOUNT" > continue)

So the variables and functions and so on defined in one test don't leak into
others. I spent like 3 commits getting that to work properly, the last of which
was commit 07bbc1f61280 and mentions the previous 2.

> i think we're talking at cross purposes because _i'm_ talking about
> variables set _within the tests, by the tests themselves_ and you're
> talking about variables set on the command-line, which i don't think
> make any sense here, because we're talking about properties of the
> individual tests/commands.

There are three scopes:

1) Variables exported into all tests

POTATO=1 make tests

2) Variables set for a single test:

POTATO=1 testcmd "thingy" "-x woo" "expected\n" "file" "stdin"

3) Variables set for the current test file.

[ -n "$TEST_HOST" ] && NOSPACE=1

Which is just a normal assignment (or export) in a tests/file.test, they go away
at the end of the current file (because of the above parenthetical subshell
calling it), and which was the new thing I added in 2022.

I remember my first attempt at this years ago ctrl-c didn't work reliably, but
the fix to that was just a trap at the top of scripts/test.sh:

  # Kill child processes when we exit
  trap 'kill $(jobs -p) 2>/dev/null; exit 1' INT

> (unless you really do want to say "there's absolutely nothing we can
> do about host whitespace, so give up completely", which i think has
> yet to be proven that it's _that_ bad. but there are commands where
> having a test that says "this whitespace -- that toybox produces -- is
> reasonable [but as long as the non-whitespace matches, and there's
> _some_ whitespace everywhere we have whitespace, we'll accept any
> whitespace from the host tool]".)

I think per-command [ -n "$TEST_HOST" ] && NOSPACE=1 might be reasonable. I'd
rather not blanket do it for all commands.

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] Poke about the bc.c cleanup patches I submitted a while ago

2024-03-27 Thread Rob Landley
Yesterday I did NOT spend all my energy reading email, and instead got
https://landley.net/bin/toolchains updated with a musl 1.2.5 and or1k and riscv
in the list, and that seems to have fixed the sh2eb build break as well
(although I haven't tried booting it on a Turtle board yet, haven't unpacked any
here in Minneapolis...) and rebuilt all the mkroot targets against the 6.8
kernel (the tmpfs patch went upstream-ish but the rest all still apply, none of
those issues will ever voluntarily be fixed by the kernel clique), and the tests
told me I need kernel/qemu configs for armv4l armv7m microblaze mips64 riscv32
riscv64 sh4eb, which reminded me of my "make the fdpic loader work on sh4 with
mmu work" which should become another patch and get finished now that I've got
updated toolchains with the sh4 longjmp bug fixed...

But today I'm being good and back to spending my energy responding to email 
instead.

On 3/24/24 21:45, Oliver Webb wrote:
> On Sunday, March 24th, 2024 at 18:27, Rob Landley  wrote:
> 
>> > I've been looking to do a cleanup pass on bc because there are a lot of 
>> > very obvious things
>> > that can be removed (typedefed structs as far as the eye can see, all the 
>> > "posixError" garbage,
>>
>> Agreed. I still haven't decided whether to throw it out and start over, but 
>> you
>> can't make it worse. (Your cleanup patch broke xzcat, but I can't tell if 
>> this
>> one is right or wrong outside of its test suite already, and only really care
>> about the kernel timeconst.bc use case anyway, so...)
> 
> Permission to remove the annoying signal handling that only really matters 
> (gets in the way of exiting) 
> on interactive sessions?

"You can't make it worse."

>> Why typecast at all? You're assigning to a variable of that size, shouldn't 
>> the
>> typecast do the assignment? (Does this suppress a warning or something?)
> 
> I did ":%s/uchar/char/g" instead of going over every individual use of 
> "uchar",
> This patch (attached) removes a lot of those unnecessary typecasts, and 
> cleans up
> the code formatting a lot, among other things like getting rid of the 
> posixError stuff,
> about 350 lines removed
> 
>> Is sizeof(char) ever not 1?
> 
> There is support for multi-byte chars in gcc (i.e. "char x = 'ABCD';")

That's a character literal (which has a return type int), not a char variable.
Assigning it to a char will give you... I'm going to guess 'D'.

> but noone uses that terrible extension from my knowledge

It seems to warn about using it by default, even:

$ cat test2.c
#include 

int main(int argc, char *argv[])
{
  char c = 'ABCD';

  printf("%d\n", c);
}
$ gcc test2.c
test2.c: In function ‘main’:
test2.c:5:12: warning: multi-character character constant [-Wmultichar]
   char c = 'ABCD';
^~
test2.c:5:12: warning: overflow in conversion from ‘int’ to ‘char’ changes value
from ‘1094861636’ to ‘68’ [-Woverflow]
$ ./a.out
68

>> > or the xz stuff,
>>
>> If you want to peel out individual upstream public domain xz patches and 
>> adapt
>> them (one at a time) to apply to toybox's xzcat, I'd be very interesting in
>> reading and applying the results.
> 
> The main problem is that it takes a lot of work to patch upstream stuff and 
> not break everything,
> I'll see what I can do, but I can't guarantee that I'll be able to get the 
> bigger blocks of code
> like the ARM64 decoder in.
> 
>> > nor the csplit regressions I started to patch out,
>>
>> What were the csplit regressions?
> 
> A lot of things since I was testing the command manually when I first wrote 
> it,

A test suite that TEST_HOST passes would be nice. I have the start of one, but
csplit is such an utterly terrible command (a half-assed sed that only wants to
write to files), I can't wrap my head around what anybody would ever WANT to use
it for.

I mean why have "prefix" and "suffix" when suffix is an arbitrary sprintf
string? Prefix on WHAT, it's not adding in the input filename, and you can't if
you try:

$ seq 1 10 | csplit - 2 %4% 7 -b '%s'
csplit: invalid conversion specifier in suffix: s

I checked busybox to see if they had tests, but the only mention of csplit in
the entire git tree there is docs/posix_conformance.txt under "Tools not 
supported".

>> Glancing at pending, I don't have a test environment for
>> arp, arping,
> 
> Networking administration stuff for ARP caches that can manipulates kernel 
> ARP table entries,
> would probably require mkroot to test safely.

Yes, I know.

>> bootchartd,
> 
> A command with no standard; Described as "bootchartd is commonly used to 
> profile the boot process.&

Re: [Toybox] test.sh: Don't override "C" command path in TEST_HOST if it's set

2024-03-24 Thread Rob Landley
On 3/24/24 18:40, Rob Landley wrote:
>> Also, different command names, there's a dozen different vi implementations 
>> and 
>> only a few have the name "vi". This is true for some other commands as well
> 
> I've been doing:
> 
>   mkdir sub
>   ln -s $(which potato) sub/vi
>   PATH=$PWD/sub:$PATH make tests
> 
> Comes up a bit already, such as testing toybox tar --xform which requires 
> toybox
> sed, and thus even the standalone test skips those unless you put toybox sed 
> in
> the $PATH.
> 
> In theory you could PATH=$PWD/sub:$PATH TEST_HOST=1 make test_vi above, in 
> which
> case "C" should wind up pointing into sub...

P.S. I don't want to commit to there still BEING a "C" a year from now. That's
an internal implementation detail, not an API.

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] test.sh: Don't override "C" command path in TEST_HOST if it's set

2024-03-24 Thread Rob Landley
On 3/22/24 16:10, Oliver Webb wrote:
>> On 3/21/24 21:38, Oliver Webb via Toybox wrote:
>> 
>> > A mildly annoying issue of you are trying to test with different 
>> > implementations of commands
>> > such as plan9 ones or sbase or busybox ones, things with different 
>> > conflicting implementations
>> > of things like xxd or vi. With this patch you can do "make test_cmd 
>> > TEST_HOST=1 C=/path/to/other/cmd"
>> > and have it work
>> 
>> I've been doing "PATH=/path/to/thingy:$PATH TEST_HOST=1 make test_cmd" for
>> years, I didn't know that needed to be documented...
> 
> plan9 has a incompatible diff implementation, which means to test plan9 utils 
> I'd
> either need to separate diff from the rest of the binaries or have some way 
> of overriding "C".
> 
> Also, different command names, there's a dozen different vi implementations 
> and 
> only a few have the name "vi". This is true for some other commands as well

I've been doing:

  mkdir sub
  ln -s $(which potato) sub/vi
  PATH=$PWD/sub:$PATH make tests

Comes up a bit already, such as testing toybox tar --xform which requires toybox
sed, and thus even the standalone test skips those unless you put toybox sed in
the $PATH.

In theory you could PATH=$PWD/sub:$PATH TEST_HOST=1 make test_vi above, in which
case "C" should wind up pointing into sub...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] toysh: fix -Wuse-after-free

2024-03-24 Thread Rob Landley
On 3/24/24 01:00, Oliver Webb wrote:
> I've done some research on this too, we have no "select" statements in any of 
> our config symbols,

for a definition of "we" that is "I have intentionally not merged any", since I
review and approve all the kconfig command sections in the headers and have been
tracking that. (At one point the config2help.c stuff was trying to stitch
dependencies together to merge help text, and didn't understand complicated 
syntax.)

That said, forking the kconfig language definition is not something I do
lightly. Ours has fallen way behind the kernel's, and thus looks like something
else but is only compatible with a subset of it. We are about to _shrink_ that
subset. This needs a FAQ entry at least.

> but we do have a fair amount of that ""SYMBOL && (SYMBOL||SYMBOL)"" 
> expression processing that's
> annoying to deal with.

I was referring to that, yes. I need to implement processing for it. I've
already implemented such processing in find, test, and twice in toysh (both
command && command and $((math&)) ).

> Also a "choice" block and a few number ranges in the main Config.in we will
> need to deal with in some way, the depends/selects stuff seems easy but with
> that expr evaluating probably isn't

Yes, I know.

> I tried to write a kconfig parser (As a toy to make the codesharing easier)

I've written at a bunch, and mostly thrown them away again. There's a simple one
in scripts/config2help.c and wrote one in python at
https://landley.net/hg/kdocs/file/tip/make/menuconfig2html.py which generated
https://landley.net/kdocs/menuconfig/ way back when. (Those are the only two
published ones that come to mind, but I've written more over the years.)

> and got absolutely nowhere. The approach I took to it was...

This isn't the hard part. To me, the hard part is wanting to share lib/*.c code
with this new binary, which implies it would live in toys/example/*.c, which
means in the NEW design it would be a normal command that's "default n"... and
maybe depends on TOYBOX_BUILD or some such? Except moving stuff from scripts/*.c
to toys/*.c is conceptually ugly. But if we're getting rid of the
subdirectories... Maybe make.sh needs to be able to build commands that DON'T
live in toys/ but then...

Unanswered design questions looming here, have not been jigsawed into an elegant
picture yet. (How much of that is assembling pieces and how much is SAWING THEM
UP I don't know yet...)

Anyway, it seems like config2help.c should also share this plumbing if it's
parsing the kconfig input anyway, which is convenient since I've been meaning to
rewrite all that too (and yes THAT has a motivating "somebody is waiting for me
to fix this", ala https://github.com/landley/toybox/issues/458 ), but there's
also the usage: line regularization
(https://landley.net/notes-2023.html#06-11-2023) and fixing the remaining
sub-options with maybe some sort of help text include syntax for inserting other
help texts at controllable points (as either blogged about or mentioned here on
the list, I'd have to check my notes to see where I left off on that)...

Once I've got the design worked out, coding it is usually the easy part.

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] hexdump tests.

2024-03-24 Thread Rob Landley
On 3/22/24 15:02, enh wrote:
>> > CANONICALIZE_SPACE_IF_RUNNING_HOST_VERSION=1? so we trust ourselves but 
>> > no-one
>> > else? :-)
>>
>> I _don't_ trust myself, and I'm not special. (That's policy.)
> 
> yeah, but that's why i suggested
> CANONICALIZE_SPACE_IF_RUNNING_HOST_VERSION --- that way we can say "we
> can't make hard assertions about the _host's_ whitespace, but we can
> still make hard assertions about _ours_". if we just canonicalize all
> the whitespace all the time, we can't (say) ensure that columns line
> up or whatever.

Or we could just "NOSPACE=1 TEST_HOST=1 make tests" if that's the test we want
to run...?

>> Erik did lash (lame-ass shell) to be tiny, Ash was the bigass lump of 
>> complexity
>> copied out of debian or some such and nailed to the side of the project by 
>> that
>> insane Russian developer who never did learn english and communitcated 
>> entirely
>> through a terrible translator program (so any conversation longer than 2
>> sentences turned into TL;DR in EITHER direction, he was also hugely 
>> territorial
>> about anybody else touching "his" code), and msh was the minix shell mostly 
>> used
>> on nommu systems.
> 
> did lash _stay_ tiny?

Yes, but it was also borderline unusable.

> i feel like the trouble with projects like that
> is usually that no-one can agree on what's necessary versus bloat, so
> you trend towards just being a bad implementation of whatever. iirc
> inferno had _two_ different "tiny" shells.

Erik implemented something tiny for his own personal use, and ignored everybody
else who tried to add stuff to it.

When Erik moved on, I studied it. When I moved on, Bernhard removed it:

  https://git.busybox.net/busybox/commit/?id=96702ca945a8

>> > because, to be fair to the confused, in english
>> > "pending" _can_ legitimately mean "almost there". whereas your whole point 
>> > with
>> > pending is "i actually have _no_ idea how close this is yet".
>>
>> Linux has drivers/staging but I didn't like that.
> 
> yeah, "staging" also sounds very much like "nearly there!".

The problem is motivated reasoning. We could call the directory
instant_death_do_not_touch and people would still enable stuff in it to see if
it worked for them. (And then ship it when it Worked For Them.)

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] toysh: fix -Wuse-after-free

2024-03-24 Thread Rob Landley
On 3/22/24 10:26, enh wrote:
> On Fri, Mar 22, 2024 at 8:24 AM enh  wrote:
>> (tbh, just merging "lsb" into "other" would be a step forwards. wtf
>> is/was "lsb" anyway? and while i can _usually_ guess "POSIX or not?"
>> correctly, "lsb or other" is impossible by virtue of being
>> meaningless.)
> 
> (and to be clear, although "lsb" is particularly obscure, i think this
> is the same problem busybox's organization has: why do i have to care
> whether something is in coreutils or linux-utils or procps? how is
> that relevant to me?

There's a reason I didn't use that as an organizing method. Although I did try
to map them at the end of the roadmap, and need to redo that analysis now since
it's been a while...

> the best answer i can think of is "because i want
> to only use toybox/busybox to replace _that_ package", but i don't
> think the _directory structure_ helps there, right? that hypothetical
> person actually wants more metadata in the kconfig part of the comment
> inside each file?)

That's the theoretical use, yes. So distros (and system builders like gentoo,
buildroot, yocto, etc) can annotate package alternatives so if you want to
install busybox's tar instead of gnu tar your package management system could
cope. In practice, making something like dpkg handle that was near impossible,
and buildroot only did it because the maintainer of busybox created buildroot. I
tried to add toybox to buildroot years ago and...

https://lists.buildroot.org/pipermail/buildroot/2014-September/409298.html

People still try from time to time:

https://lists.buildroot.org/pipermail/buildroot/2017-January/181960.html
http://lists.busybox.net/pipermail/buildroot/2022-September/652474.html

But even a build system that ALREADY lets you swap in/out buildroot vs gnu
versions of packages accomplished that by hardwiring busybox support deep into
its build system.

Getting something like debian to do that on the fly is... it's not really
designed for it.

I can think of better ways to do it (and am studying debian's build system in my
copious free time), but I've been busy with other things and most people aren't
motivated to try...

I note that I did it by hand back when creating aboriginal linux, which is what
led me to maintaining busybox in the first place, ala:

https://landley.net/aboriginal/old/

> When the Firmware Linux project started, busybox applets like sed and sort
> weren't powerful enough to handle the "./configure; make; make install" of
> packages like binutils or gcc. Busybox was usable in an embedded router or
> rescue floppy, but trying to get real work done with it revealed numerous
> bugs and limitations.
> 
> Busybox has now been fixed, and in Firmware Linux Busybox functions as an
> effective replacement for bzip2, coreutils, e2fsprogs, file, findutils, gawk,
> grep, inetutils, less, modutils, net-tools, patch, procps, sed, shadow,
> sysklogd, sysvinit, tar, util-linux, and vim. (Eventually, it should be
> capable of replacing bash and diffutils as well, but it's not there yet.)

That's the old page from before I restarted the project and renamed it
Aboriginal Linux (based on QEMU instead of User Mode Linux, ala
https://landley.net/notes-2005.html#27-10-2005). Before that I was going though
the Linux From Scratch package list and _disposing_ of gnu packages, one by one,
as I got busybox to replace them.

But "dpkg-query -S $(which $NAME)" is pretty easy to do the mapping yourself on
debian...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] toysh: fix -Wuse-after-free

2024-03-24 Thread Rob Landley
On 3/22/24 10:24, enh wrote:
> On Thu, Mar 21, 2024 at 8:45 PM Rob Landley  wrote:
>> Anyway, toys/android basically meant (to me), "commands that come from and 
>> are
>> maintained by Elliott which I can't even test because they don't apply to a
>> vanilla linux system that isn't running the full android environment". 
>> Although
>> that's a personally idiosyncratic definition because I lumped selinux in with
>> that;
> 
> (heh. you beat me to it :-) )

If the new kconfig greyed out unavailable entries and had a status line saying
"depends on TOYBOX_ON_ANDROID" or similar when you cursored over a greyed out
entry...

There _is_ a way to collapse everything together into one directory and make it
manage-ish-able. But there are currently 52 command files in pending, and "ip.c"
alone is 6 commands and 3000 lines of "we already have route and ifconfig and
iptables and so on as separate commands, why did they do it again?"

>> It's been the status quo for a dozen years now (commit 3a9241add947 in 2012) 
>> and
>> moving everything AGAIN would have costs, so I'd want a reason and assurance
>> that we're not going to change our minds again.
> 
> for me the holy grail is "tab complete works and i don't have to think
> about arbitrary partitions".

It's a good point.

> i think "not yet default 'y'" is pretty
> defensible (though the reason we're having this discussion is because
> people _don't_ read "pending" as "danger, keep out!"), but the rest
> seem so arbitrary.

I'd like there to not BE "danger, keep out" in the tree, but a certain large
korean company wanted their contributions checked in, I fell behind, and it
snowballed from there.

>> Collapsing the directories
>> together when the last command is promoted (or deleted) out of pending might
>> make sense, figuring out what to do about example/ (trusting to the demo_ 
>> prefix
>> to annotate the example commands is nice, but hello.c hostid.c logpath.c and
>> skeleton.c would need... something).
> 
> no, i think example/ is defensible too. (i'd argue you're only ever
> going to look in there if you have a _reason_ to. or you've done a
> `grep -r` for something you're changing/checking all references to.
> the reason i completely forgot about example/ is that it never causes
> me the "where the fuck is _mount_?!" annoyance :-) )

Right now everything is at the same level. Having files at two different levels
is not a simplification.

Designing a way to have toys/*.c with no subdirectories and make it manageable
seems a reasonable goal, if tricky to get to. Having toys/*.c _and toys/*/*.c
does not smell like an improvement?

We've got: android  example  lsb  net  other  pending  posix

Pending needs everything cleaned up and prompted or deleted. Posix can be a
defconfig file. Example can be commands that "default n". Android isn't
necessary if a kconfig replacement greys things out instead of hiding them and
displays WHY they're greyed out when you cursor over them (and the rewrite is
needed to address pull request 332). Other, net, and lsb aren't sufficient
distinction to persist in the absence of other directories.

And that's all of them, I think?

If we really wanted to rush this, I could make a TOYBOX_UNFINISHED symbol that
the pending stuff could depend on, and then the blocker is the kconfig
replacement...

Not THIS release though. Working on release notes! (And lowering my standards on
the todo list.)

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] more.c: More stuff, down cursor key scrolls down. Also stuff about less

2024-03-24 Thread Rob Landley
On 3/21/24 06:52, Jarno Mäkipää wrote:
> On Thu, Mar 21, 2024 at 1:08 AM Rob Landley  wrote:
>> >> > There is also a testing problem. vi.c doesn't do TEST_HOST because it 
>> >> > needs a -s option
>> >> > to pass in scripts to test with.
>> >>
>> >> Which is an issue I need to figure out how to address. What does a test 
>> >> that
>> >> only toybox passes actually prove? (That it hasn't changed since we last
>> >> looked at it?)
>> >
>> > There is vi -c which preforms a ex command which we could implement
> 
> I took -s from vim, so toybox vi could be tested comparing to vim,
> since vi itself does not have -s. And I was not interested in -c since
> ex was out of the scope of implementation at that time.

I'm not saying it's bad, I'm saying it's not sufficient. (The toysh tests have
_both_ "testcmd" and "shxpect" tests.)

Also, I'm not UPSET that someone's been making vi usable. Something is better
than nothing and I'm thankful. I'm just really annoyed at myself for not having
been able to get to it myself in a reasonable amount of time.

The vi that's there has users, and at some point I _do_ need to go through and
digest it all and wrap my head around it and take ownership of the thing, but i
haven't even managed to reboot my laptop for months to install the new devuan
version and put the 16 gig memory sticks back in, because I've been opening tabs
as fast as I've been closing them and trying to close them turns into "let me
fix this one thing real quick"... (It's like trying to pack bookshelves and
winding up reading books, which I also spent too much of last month doing.)

>> I leave vi to the people who are maintaining that vi. I got out of way for 
>> that
>> command.
>>
> 
> Well im not sure who is "maintaining" vi.c at this point, I wrote base
> implementation years ago, Elliott extended it with few commands,
> because he had some use case for it. But mostly development has been
> dormant for few years with few segfault bugfix here and there. Its not
> very pleasant experience to maintain it, since everything lead to huge
> bikeshedding, since there is no particular standard to follow,
> everyone want different things.

Indeed. I taught an "intro to unix" course at austin community college many
moons ago which had like 20 vi keys on the syllabus (half of which were new to
me, and most of which I've forgotten again). And every time I install a fresh
debian I have to go through my checklist including:

  sudo ln -sf vimrc /etc/vim/vimrc.tiny && echo export EDITOR=vi >> ~/.profile

Because going into "insert" mode and having the cursor keys crap capital letters
all over your text is stupid (this vimrc.tiny mode STILL RUNS THE SAME BIG
EXECUTABLE), and as with dash and upstart and mir and unity I suspect Mark
Shuttleworth was behind it:

  https://mstdn.jp/@landley/112119853431329313

And no I'm not typing "vim" any more than gsed, gawk, or gmake...

> Also from what I understand reading
> your postings, you have never been very satisfied on it. And that is
> understandable.

The thing is, I'm not a vi expert any more than I was a sed expert before I
wrote my own sed (twice). At some point, I have to learn enough awk to write an
awk that can replace gawk in every package build in LFS and BLFS (and hopefully
someday AOSP), and I'm not looking forward to that. I know I _need_ to, but I'm
currently overwhelmed with half-finished stuff and am trying to dig out.

I'm somewhat familiar with the subset busybox chose for its vi, although that
was always missing several things I use, so good point of reference but not a
standard. And I need to read the posix standard for vi. And then I was going to
implement some low-hanging fruit have people tell me what they missed...

>> >> I have been planning one all along, yes. The crunch_str() stuff I did was 
>> >> a
>> >> first pass at general line handling stuff that could be used by less and 
>> >> by
>> >> shell line editing and by vi and so on, but people wrote a vi that does 
>> >> not and
>> >> never will share code with the rest of those so that's off the table
>> >> permanently.
> 
> vi.c uses crunch_str from lib for utf8 handling, there was just few
> corner cases it needs to use vi only crunch_nstr, since it cant spit
> up text until nul all the time. vi.c tried to use some other
> functionality from lib also, but some of it got removed from lib and
> some functionality have probably been added way after vi.c was written
> in 2018-2020.

I tend to do passes over the whole tree from time to time cleaning stuff up and
modernizing it. (I re-review commands I hadn't seen 

Re: [Toybox] [PATCH] toysh: fix -Wuse-after-free > FYI lsb

2024-03-23 Thread Rob Landley



On 3/22/24 18:09, scsijon wrote:
> Date: Fri, 22 Mar 2024 08:24:18 -0700
> 
>> From: enh 
>> To: Rob Landley 
>> Cc: Oliver Webb , toybox
>>  
>> Subject: Re: [Toybox] [PATCH] toysh: fix -Wuse-after-free
>> Message-ID:
>>  
>> Content-Type: text/plain; charset="UTF-8"
>>
>> On Thu, Mar 21, 2024 at 8:45?PM Rob Landley  wrote:
>>> On 3/17/24 14:52, Oliver Webb wrote:
>>>> On Thursday, March 14th, 2024 at 12:04, enh  wrote:
>>>>> at a high level, it does seem like many/most people interpret "pending" 
>>>>> as "almost done" (he says, being part of the problem himself, having 
>>>>> several pending things building and shipping on all Android devices) 
>>>>> whereas in actual fact it can mean anything from "yeah, actually pretty 
>>>>> much done" to "will be completely rewritten" via "still just trying 
>>>>> random experiments trying to work out _how_ this should be rewritten".
>>>>> sadly i don't have a better suggestion...
>>>> pending/experimental and pending/functional maybe, or something along that 
>>>> gist?
>>> That would be my "not adding more complexity to manage transient clutter 
>>> that
>>> should instead go away" objection, already made.
>>>
>>>> Then again it'd make it harder to track the history of pending commands, 
>>>> adding only new ones
>>>> to those 2 directories would fix that, but would make the organizational 
>>>> problem for the old
>>>> ones worse.
>>> https://en.wikipedia.org/wiki/Fundamental_theorem_of_software_engineering
>>>
>>> Stop. No. Halt. Wait. Hold it. Woah. Cease. Desist. Caution severe tire 
>>> damage.
>>> Klatu barata nikto. Subcalifragilisticexpialidocious.
>>>
>>>>> a branch would be the usual git option, but that would probably mean "no 
>>>>> pending stuff in the main branch"
>>>> Also a problem if you want to switch Version Control systems or distribute 
>>>> tarballs without a .git/ directory.
>>> I already DID switch version control systems (from mercurial to git), and I
>>> already distribute release tarballs. Why do you think these are new issues?
>>>
>>>> It'd hide these commands too,
>>> I want to close tabs. I am not creating additional scaffolding for clutter
>>> management:
>>>
>>> $ ls -d */toys
>>> clean3/toys  clean8/toys github/toys  kl4/toys  kl9/toys  
>>> toybox/toys
>>> clean5/toys  clean.old/toys  kl10/toyskl6/toys  kleen/toys
>>> clean6/toys  clean/toys  kl2/toys kl7/toys  kl/toys
>>> clean7/toys  debian/toys kl3/toys kl8/toys  release/toys
>>>
>>> I already try not to publish quite as much clutter as accumulates locally.
>>>
>>> There's some real fossils checked into the tree. I started work on gene2fs 
>>> back
>>> under busybox, checked in what I had to the toybox repo in 055cfcbe5b05 in 
>>> 2007
>>> and haven't LOOKED at it this decade because I just haven't gotten back 
>>> around
>>> to it. Since then they INVENTED EXT4. (I still hope to get back to it, but 
>>> at
>>> the moment I'm answering email.)
>>>
>>>> For the first time I checked if there were any special branches in the 
>>>> repo because
>>>> I didn't bother to think about that in the months I spent working on it.
>>>>
>>>>> i still struggle between "other" and "lsb" in particular.
>>>> Same here, I can remember the posix commands.
>>> Can you? I still have to check some from time to time, and the definition of
>>> whether "tar" is a posix command or not is outright eldrich bordering on 
>>> quantum.
>>>
>>>> But I don't care about LSB enough to
>>>> memorize everything in wants. And keeping all completed commands that 
>>>> aren't in poisx,
>>>> lsb, networking or android
>>> The "example" directory is important because it's the only other directory 
>>> of
>>> commands that should not "default y" in defconfig. It has a policy 
>>> distinction.
>>>
>>> Back in 2012, when the number of commands was growing fast and having one 
>>> big
>>> directory of them all was getting a bit busy, the alternative of sorting 
>>> them
&g

Re: [Toybox] [PATCH] toysh: fix -Wuse-after-free

2024-03-23 Thread Rob Landley
On 3/21/24 23:59, Oliver Webb wrote:
> On Thursday, March 21st, 2024 at 22:45, Rob Landley  wrote:
>> On 3/17/24 14:52, Oliver Webb wrote:
>> > Same here, I can remember the posix commands.
>> 
>> Can you? I still have to check some from time to time, and the definition of
>> whether "tar" is a posix command or not is outright eldrich bordering on 
>> quantum.
> 
> I can certainly remember them better then the LSB commands. Most of the time 
> I can
> remember if a command is in posix, which is what matters when trying to find 
> it usually.

Congratulations?

>> Collapsing the directories together when the last command is
>> promoted (or deleted) out of pending might make sense,
> 
> What would happen when a new command shows up and we need to evaluate it then?

Presumably once caught up there wouldn't usually be a dozen of them submitted
the same month, so I wouldn't fall far enough behind to need a dedicated waiting
room.

> Or glibc does a new release and yet another thing breaks we need to demote and
> re-promote eventually?

I don't de-promote commands because glibc does something stupid each new
release. That's just normal gnu/braindamage:

https://github.com/landley/toybox/issues/450
https://github.com/landley/toybox/pull/364
https://github.com/landley/toybox/issues/362

I de-promoted a command since last release because I rewrite lib/password.c in a
way that broke stuff and didn't want people poking me about it, which was me
being lazy/whelmed. Not having the option to do that is fine too, and would have
made that stay higher on the todo list. (I could also have "default n" it
without moving it, I do that locally all the time when in-progress changes break
stuff. The difference this time was I'd checked IN the stuff that broke a
command, and didn't want to revert it.)

>> I also note I think I've figured out how to replace kconfig: I can just make 
>> a
>> list that scrolls up and down with a highlighted entry you hit space on, 
>> handle
>> help text, search, exit/save, resolve selects and depends and have "menus" 
>> be a
>> label line with its contents nested two spaces further to the right.
> 
> [Some paragraphs bikeshedding about kconfig use to be here, may they rest in 
> a text file
> until we get around to doing the kconfig rewrite]

Technically a project's maintainer explaining upcoming design issues he actually
plans to implement isn't "bikeshedding".

Bikeshedding is vaguely related to the Dunning-Kruger effect, in which the
question "how hard can it be?" requiring some expertise to actually answer gets
people in trouble.

Cyril Parkinson is mostly known for Parkinson's Law (work expands to fill
available time) but he also came up with the bike shed example, where a
committee approving plans for a nuclear reactor defers to the experts enough
that at least its budget approval gets discussed quickly, but a committee
approving plans for a bike shed will argue far longer about every detail because
they think they could do it themselves and have strongly held opinions.

Everybody has an opinion on building the bike shed, and thinks their opinion is
equally valid as everyone else's with no deference to authority, experience, or
expertise. But the thing about a committee approving plans is they STARTED with
a viable plan for the thing, which they then ignore because they know better.

If you feel like I'm "bikeshedding" about a kconfig replacement when I was
involved in https://lkml.indiana.edu/hypermail/linux/kernel/0202.1/2037.html and
argued at length with Roman Zippel about https://lwn.net/Articles/160497/ and
dug rather a lot through busybox's fork of it back around
https://git.busybox.net/busybox/log/scripts/config?id=7a43bd07e64e and already
implemented scroll up/down/left right list logic like I'm describing in the
"top" command... I think we have a different definition of the term.

>> > A possible solution is to...
>> 
>> ...
>> 
>> > Then again...
>> 
>> I need to stop checking email every time I sit down at my laptop, because
>> bikeshedding can eat an endless amount of time and I've got other stuff to 
>> do.
>> 
>> For one thing, I promised to look at
>> https://github.com/landley/toybox/issues/486 tonight.
> 
> Sorry for getting in the way of that, the technical discussion about it was
> interesting enough to me to respond to. Recently found something to run off to
> and do while still benefiting toybox, so I'll stop bikeshedding about stuff 
> like this.

I'm complaining about my own insufficient time management skills, I'm not trying
to discourage people from taking an interest in the project.

I do find "why is it like this" easier to deal with than "l

Re: [Toybox] test.sh: Don't override "C" command path in TEST_HOST if it's set

2024-03-22 Thread Rob Landley
On 3/22/24 16:11, Rob Landley wrote:
> On 3/21/24 21:38, Oliver Webb via Toybox wrote:
>> A mildly annoying issue of you are trying to test with different 
>> implementations of commands
>> such as plan9 ones or sbase or busybox ones, things with different 
>> conflicting implementations 
>> of things like xxd or vi. With this patch you can do "make test_cmd 
>> TEST_HOST=1 C=/path/to/other/cmd"
>> and have it work
> 
> I've been doing "PATH=/path/to/thingy:$PATH TEST_HOST=1 make test_cmd" for
> years, I didn't know that needed to be documented...

P.S. The point of C= being a path is otherwise shell builtins tend to get called
(so you're not necessarily testing what you think you are), and last I checked I
hadn't found a portable mechanism for disabling a specific shell builtin other
than providing a path to the command to run. (If you disable _all_ shell
builtins the test script could break due to missing commands on some systems.)

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] test.sh: Don't override "C" command path in TEST_HOST if it's set

2024-03-22 Thread Rob Landley
On 3/21/24 21:38, Oliver Webb via Toybox wrote:
> A mildly annoying issue of you are trying to test with different 
> implementations of commands
> such as plan9 ones or sbase or busybox ones, things with different 
> conflicting implementations 
> of things like xxd or vi. With this patch you can do "make test_cmd 
> TEST_HOST=1 C=/path/to/other/cmd"
> and have it work

I've been doing "PATH=/path/to/thingy:$PATH TEST_HOST=1 make test_cmd" for
years, I didn't know that needed to be documented...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] toysh: fix -Wuse-after-free

2024-03-21 Thread Rob Landley
On 3/17/24 14:52, Oliver Webb wrote:
> On Thursday, March 14th, 2024 at 12:04, enh  wrote:
>> at a high level, it does seem like many/most people interpret "pending" as 
>> "almost done" (he says, being part of the problem himself, having several 
>> pending things building and shipping on all Android devices) whereas in 
>> actual fact it can mean anything from "yeah, actually pretty much done" to 
>> "will be completely rewritten" via "still just trying random experiments 
>> trying to work out _how_ this should be rewritten".
>> sadly i don't have a better suggestion...
> 
> pending/experimental and pending/functional maybe, or something along that 
> gist?

That would be my "not adding more complexity to manage transient clutter that
should instead go away" objection, already made.

> Then again it'd make it harder to track the history of pending commands, 
> adding only new ones
> to those 2 directories would fix that, but would make the organizational 
> problem for the old
> ones worse.

https://en.wikipedia.org/wiki/Fundamental_theorem_of_software_engineering

Stop. No. Halt. Wait. Hold it. Woah. Cease. Desist. Caution severe tire damage.
Klatu barata nikto. Subcalifragilisticexpialidocious.

>> a branch would be the usual git option, but that would probably mean "no 
>> pending stuff in the main branch"
> 
> Also a problem if you want to switch Version Control systems or distribute 
> tarballs without a .git/ directory.

I already DID switch version control systems (from mercurial to git), and I
already distribute release tarballs. Why do you think these are new issues?

> It'd hide these commands too,

I want to close tabs. I am not creating additional scaffolding for clutter
management:

$ ls -d */toys
clean3/toys  clean8/toys github/toys  kl4/toys  kl9/toys  toybox/toys
clean5/toys  clean.old/toys  kl10/toyskl6/toys  kleen/toys
clean6/toys  clean/toys  kl2/toys kl7/toys  kl/toys
clean7/toys  debian/toys kl3/toys kl8/toys  release/toys

I already try not to publish quite as much clutter as accumulates locally.

There's some real fossils checked into the tree. I started work on gene2fs back
under busybox, checked in what I had to the toybox repo in 055cfcbe5b05 in 2007
and haven't LOOKED at it this decade because I just haven't gotten back around
to it. Since then they INVENTED EXT4. (I still hope to get back to it, but at
the moment I'm answering email.)

> For the first time I checked if there were any special branches in the repo 
> because
> I didn't bother to think about that in the months I spent working on it. 
> 
>> i still struggle between "other" and "lsb" in particular.
> 
> Same here, I can remember the posix commands.

Can you? I still have to check some from time to time, and the definition of
whether "tar" is a posix command or not is outright eldrich bordering on 
quantum.

> But I don't care about LSB enough to
> memorize everything in wants. And keeping all completed commands that aren't 
> in poisx,
> lsb, networking or android

The "example" directory is important because it's the only other directory of
commands that should not "default y" in defconfig. It has a policy distinction.

Back in 2012, when the number of commands was growing fast and having one big
directory of them all was getting a bit busy, the alternative of sorting them
into directories was annotating them with tags, and THAT was a nightmare (of the
"this command has three tags" variety). And also implied future pressure to
extend the existing kconfig implementation to USE the tags, which would be 
worse.

Moving them into subdirectories, with each command in ONE directory, and a
README explaining what the directory was for, with kconfig automatically
displaying them in menus and using the first line of the README as the menu's
title, seemed the least bad crowd control option at the time.

> in a massive "other" folder sorta defeats
> the purpose of these directories which are supposed to reduce clutter.

It wasn't really about reducing clutter. I mean yeas, back then some web viewers
wouldn't display more than 250 files in a directory, the way github truncates at
1000 today:

https://github.com/landley/linux/tree/master/arch/arm/boot/dts

But the goal was annotating command categories. Posix and pending are obvious,
and I mentioned example. Back when I split them up, LSB was still a viable
standard (the Linux Foundation hadn't destroyed it yet), and it STILL kind of
means "this command existed back in Y2K and was considered part of the base
system back then, even if posix never caught up". Several commands in pending
get promoted into LSB (such as most of the password stuff, although oddly
mkpasswd is NOT in lsb 4.1).

Hmmm, possibly instead of a dead standard the linux foundation killed, I should
instead check the $PATH of my old red hat 9 install from the dawn of time...
Hah, it's still on busybox's website:
https://busybox.net/downloads/qemu/rh-9-shrike.img.bz2 Login as user 

Re: [Toybox] [PATCH] toysh: Shut up TEST_HOST, correct 3 test cases

2024-03-21 Thread Rob Landley


On 3/17/24 10:23, Oliver Webb wrote:
> On Fri, Mar 8, 2024 at 19:46, Rob Landley mailto:On Fri, 
> Mar
> 8, 2024 at 19:46, Rob Landley <> wrote:
>> On 3/7/24 19:39, Oliver Webb via Toybox wrote:
>> > Looking at toysh again since the toybox test suite should run under it
>> > (in mkroot or under a chroot) A problem seems to be that there is no
>> > return command, which breaks runtest.sh to it's core. Dont know how to add
>> one in yet
>> >
>> > On my version of bash (5.2.26) TEST_HOST fails on 3 test cases,
>> > and toysh also fails on those cases (Even tho toysh is doing the right
>> > thing, the same as bash) The attached patch changes the test file
>> > so that 3 test cases are resolved. And TEST_HOST works
>>
>> Because Chet changed stuff I asked him about, making bash a moving target.
> This does bring up the question of what to do with specific edge cases. Since
> bash can’t even be consistent with itself, most bash scripts don’t rely on 
> them,
> at least the ones I’ve seen.
> 
> Should we set out to implement every specific edge case, and if so what 
> version
> are we confirming with? Or should we pick what’s most sensible/easiest to deal
> with and toyonly the test cases for them.

I've been studying the problem space since 2006, have read the bash manual all
the way through more than once, read some subset of the 'advance bash scripting
guide", and was basically making judgement calls. then Elliott got me talking
directly to the bash mainintainer, which from my perspective made a lot of those
corner cases a moving target when they weren't before.

In fact my FIRST pass at this was matching the bash 2.04b behavior from like
1999 that I used in aboriginal linux, until gentoo's portage scripts needed
newer bash features, specifically ~= and some quoting corner case behavior...

"What should all those judgement calls be ahead of time, I demand preemptive
policy" does not personally strike me as helpful. I was mostly trying to
implement what seemed good to me (which still involves asking a LOT of questions
and turning them into test cases to see what bash's behavior actually IS), then
run the Linux From Scratch and Beyond Linux From Scratch package builds through
it to see what broke, then wait for people to complain and take it on a case by
case basis.

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] more.c: More stuff, down cursor key scrolls down. Also stuff about less

2024-03-21 Thread Rob Landley
On 3/21/24 16:13, Oliver Webb wrote:
> On Thursday, March 21st, 2024 at 15:53, Rob Landley  wrote:
> 
>> I note that "more" is from the days of daisy wheel teletypes, and was thus
>> designed to work ok without a tty or interaction through cursor keys (you can
>> export $COLUMNS and $LINES or just let it guess 80x25), and "less" requires a
>> tty and cursor keys. This might make "more" a better fit for on-screen 
>> keyboards
>> that don't provide cursor keys. (Or not...)
> 
> less supports vi keys (hjkl), and all the keybindings of more. less doesn't 
> require
> cursor keys in the same way vi doesn't, it's just how it's more commonly used.

Piping data through more doesn't allocate memory. Piping data through less
continues to allocate memory as data is accumulated. I don't know if there's a
backscroll limit, so I don't know if there's a limit on the amount of memory it
allocates.

>> I would like to have one implementation sharing code. Implementing "less -R"
>> cuts the behavior delta between the two, and having an option to let ctrl-c 
>> exit
>> less (instead of just killing the rest of the pipeline) probably gets us 
>> close
>> enough we to handwave the rest?
> 
> There is less -K and less -E (exit on C-c and exit at EOF respectively),

Good to know.

> so more_main would look something like:
> 
> void more_main(void)
> {
>   toys.optflags |= FLAG_E|FLAG_K|FLAG_R;
>   less_main();
> }
> 
> Once we have a good enough less.

We could implement a big thing and have it pretend to be a small simple thing, 
yes.

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] more.c: More stuff, down cursor key scrolls down. Also stuff about less

2024-03-21 Thread Rob Landley
On 3/20/24 11:47, enh wrote:
> On Wed, Mar 20, 2024 at 9:38 AM Rob Landley  <mailto:r...@landley.net>> wrote:
> 
> On 3/20/24 00:02, Oliver Webb via Toybox wrote:
> > I spotted the more implementation in pending. Looking at it, it's 
> missing
> quite a lot of stuff,
> > Such as the ability to go back in a file.
> 
> More never had the ability to go backwards, less did. Different command.
> 
> 
> (...but there's a lot of confusion because many modern systems have more just 
> a
> symlink to less.)

Ooh, there's a fun edge case.

A failure mode of busybox is what if you symlink an unknown name to an existing
command, busybox says the unknown name is an unknown command. But in toybox, if
it doesn't recognize the name toybox_main loops resolving symlinks until it runs
out of them or hits a recognized name:

  // fast path: try to exec immediately.
  // (Leave toys.which null to disable suid return logic.)
  // Try dereferencing symlinks until we hit a recognized name
  while (s) {
char *ss = basename(s);
struct toy_list *tl = toy_find(ss);

if (tl==toy_list && s!=toys.argv[1]) unknown(ss);
toy_exec_which(tl, toys.argv+1);
s = (0 less -> toybox would act
like more, not like less. (Unless you configured more out but left less in, then
it should behave like less.)

I note that "more" is from the days of daisy wheel teletypes, and was thus
designed to work ok without a tty or interaction through cursor keys (you can
export $COLUMNS and $LINES or just let it guess 80x25), and "less" requires a
tty and cursor keys. This might make "more" a better fit for on-screen keyboards
that don't provide cursor keys. (Or not...)

I would _like_ to have one implementation sharing code. Implementing "less -R"
cuts the behavior delta between the two, and having an option to let ctrl-c exit
less (instead of just killing the rest of the pipeline) probably gets us close
enough we to handwave the rest?

I need to genericize my watch.c code to share the cursor tracking with less.
Possibly keep a scrollback buffer. Except there's still some extension because
watch.c doesn't let you cursor left and right...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] mount: avoid deferencing NULL.

2024-03-20 Thread Rob Landley
On 3/20/24 16:07, enh via Toybox wrote:
> I don't know why I wasn't seeing this yesterday

Because /sys was mounted, so readfile() returned a string with its contents.
(And/or race condition of the mount going away between reading /proc/mounts and
asking for follow-up data about a specific mount point from sysfs.)

Sigh, I initialized ss to "" so I could just printf("%s", ss) without testing,
but readfile() returns NULL when the file doesn't exist and I overwrite it in
place because I didn't want to juggle through a THIRD variable (mostly because
I'm out of convenient names for them), and I missed an else setting it BACK to
"" in the NULL case.

Adding the one test doesn't fix printf() calling null, which segfaults on some
libcs. Lemme put the else in...

The real design failure here is that if the readfile() returns an empty string
we won't free it, but that should never happen, the amount of memory leaked
would be trivial and the command exits at the end of the list.

Hmmm... well, I COULD move the s = xabspath(mm->device, 0) down to the end of
the if (*s == '/') and then use THAT as my third variable...

Ok, I rewrote the code to use three varaibles and thus leave the "" in ss when
it doesn't have reason to change it. (Single Point of truth, setting it BACK to
"" and thus having two "" constants was icky. Yeah, tiny flaw but _I_ saw it.)

Commit d298747580c7 and once again I've only tested the "file exists" path, I'm
not unmounting sysfs on my work laptop and haven't got a convenient test vm I
can loopback mount a filesystem image in at the moment. (The devuan install iso
image I've been using is a bit big to stick in initramfs...)

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] chattr.test: awk -> cut so mkroot can run it

2024-03-20 Thread Rob Landley
On 3/20/24 15:20, Oliver Webb via Toybox wrote:
> Patch does what it says on the tin. First thing I caught while doing a test 
> of all commands
> in mkroot chattr fails all tests on my system (A ton of "Operation not 
> permitted" errors, 
> on ext4), but the failures are consistent with TEST_HOST so I guess chattr 
> doing what it's
> supposed to? (Yes, I ran it as root) The .test file will need a rewrite 
> eventually but right
> now I'm just trying to get all tests to run under mkroot

Elliott keeps sending me patches to remove bashisms from the test suite so it
works under mksh, which I was intentionally leaving in because I intend to
implement that before 1.0 and wanted to dogfood it. I have a file of the ones
that were removed so I can put them BACK at some point...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [RFC] mkroot: Possible solution to running tests in a vacum: Use the host bash in a chroot

2024-03-20 Thread Rob Landley
On 3/20/24 12:38, Oliver Webb via Toybox wrote:
> A target for the 0.9 release is the test suite running under mkroot,

On all the architectures mkroot supports (endianness, word size, kernel
version), under qemu with a known kernel environment so we can test things like
insmod with known modules, or test ifconfig and hostname without destabilizing
my development laptop.

> Which is also required
> for passwd to be re-promoted (We need to test it in a vacuum).

Eh, I can test that manually for one release. My problem is I keep getting
distracted by tangents. The "create changelog" todo item made it as far as
commit 40e73a387329 which has a pending TODO item I tried to fix (refill toybuf
to try to span EXIF data when file is identifying JPEG files), I need to instead
WRITE THAT DOWN (and leave it unfixed for now) and continue to the end of the
list so I _have_ a current changelog and CAN cut a release... but haven't yet.

> The main downside of this is that you have to look for the dynamic libraries 
> bash wants and
> copy them into the fs directory,

The code I wrote to do that way back when was something like:

  https://git.busybox.net/busybox/commit/?id=3a324754f88b

I.E. recursively call ldd to see what its dependencies are, repeat until you run
out of dependencies.

I _can_ make this work. It's just not the direction I wanted to go in.

 and doing a chroot requires root permissions. Also it is very
> clearly not a permanent fix (None of this is needed once toysh is ready), 
> just enough to get
> tests for commands like passwd and chsh running. Another downside of 
> chroot-ing is you can't
> emulate things that depend on drivers or nommu.
> 
> Attached is a mkroot package (Not a patch), that sets up a environment to run 
> the test suite
> under a chroot in. (./mkroot/mkroot.sh testwhost && sudo chroot root/host/fs 
> /test command_name).
> It's not something I'm actually expecting to be merged, but that doesn't mean 
> it's not potentially
> useful for testing the commands that modify /etc/passwd and friends.

I was setting up a debootstrap to test it under, since that's presumably
isolated enough, but last time I sat down to poke at that I got distracted into
the Orange Pi 3b server setup which is the _other_ consumer of a debootstrap I
have lying around, and then I went "too much for now but I can at least do the
testing under a qemu-system-arm64 with devuan arm64 debootstrap" and hit the
fact that trying to marshall a tarball into mkroot using "wget | tar xpv" spat
out endless unexpected EOF files because the tarball autodetection logic had a
regression and the child process thinks it's the parent process.

Still have a tab open for that, trying to dig back down to fix it, been
distracted by external pokes instead.

> Also when making this I spotted some things in the build infrastructure we 
> will need to work around
> in a airlock-ed test suite, test.sh needs configure,

Only for single builds, not for testing all of toybox.

And presumably I should add a TEST_EXISTING=1 to skip the single build and just
grab the command out of the current directory and/or $PATH. (There's always more
work to do on the test suite...)

> and portability.sh needs something for CC or
> else it will throw a fit.

I know, one of my trees has a partial patch for it, but there's some design 
work...

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] more.c: More stuff, down cursor key scrolls down. Also stuff about less

2024-03-20 Thread Rob Landley
On 3/20/24 11:56, Oliver Webb wrote:
> On Wednesday, March 20th, 2024 at 11:39, Rob Landley  
> wrote: 
>> More never had the ability to go backwards, less did. Different command.
> 
>>From the more help text you get when you press "h":
> 
> b or ctrl-B Skip backwards k screenfuls of text [1]

$ ls -l /bin/more
-rwxr-xr-x 1 root root 47816 Nov 27  2019 /bin/more
$ dpkg-query -S /bin/more
util-linux: /bin/more
$ cat README | more

I hit space twice to advance, then hit b and ctrl-b a lot, no effect. At a
guess, that particular gnu/dammit extension which isn't in posix or busybox more
only works on a seekable file.

I personally don't want to complicate more because less exists.

>> > Looking at the other keybindings GNU more provides which I can implement, 
>> > There's "=" (prints current
>> > line number) ":f" (print filename and line), as well as being able to use 
>> > the down arrow to go down
>> > (with the added side effect of any escape key doing so too, not the end of 
>> > the world, especially
>> > since we can't scroll up) That are Implemented them in the attached patch.
>> 
>> Again, more and less are not the same command.
> 
> No, all of that is more behavior that you can use in more. Try it.

I just did, above.

One easy way to distinguish between less and more is that ctrl-c exits more, but
ctrl-c only kills the command producing less's output while leaving less
displaying the scrollback buffer. You need to hit 'q' to get out of less (unless
you've hit something like forward slash where q just appends to the string and
esc doesn't exit that either, but ctrl-c will exit... back to the less prompt),
meaning newbies can get STUCK in less not knowing how to exit it, the way you
can't in "more".

Oh, here's another difference:

$ { echo -e '\e[42mcolor\e[0m text'; while true; do echo hello; done; } | more
shows the color change
$ { echo -e '\e[42mcolor\e[0m text'; while true; do echo hello; done; } | less
shows the escape sequences

That's why I'm still pondering if they can/should usefully share code.

>> > There is also a testing problem. vi.c doesn't do TEST_HOST because it 
>> > needs a -s option
>> > to pass in scripts to test with.
>> 
>> Which is an issue I need to figure out how to address. What does a test that
>> only toybox passes actually prove? (That it hasn't changed since we last
>> looked at it?)
> 
> There is vi -c which preforms a ex command which we could implement

I leave vi to the people who are maintaining that vi. I got out of way for that
command.

>> I have been planning one all along, yes. The crunch_str() stuff I did was a
>> first pass at general line handling stuff that could be used by less and by
>> shell line editing and by vi and so on, but people wrote a vi that does not 
>> and
>> never will share code with the rest of those so that's off the table
>> permanently.
> 
> My experience is in vi.c which is why I mentioned using code from it. I 
> haven't read
> through top or hexedit

I haven't read through the vi.c in pending.

>> > But I have to ask the question "If it's so easy, why isn't it in toybox 
>> > yet?" Is it just because
>> > other TODO items taking up time, or is it because it's harder to implement 
>> > than it seems.
>> 
>> Because I care about edge cases like ansi escapes and utf8 fontmetrics and
>> resizing the screen partway through displaying, because I haven't got test 
>> suite
>> infrastructure that can emulate the other half of a PTY yet without which
>> testing has to be manual, because I wanted multiple things to share
>> infrastructure (including potentially stuff like "fold")...
> 
> So it's harder to implement than it seems, thank you.

Well it's harder to TEST. There's a reason I've been working towards mkroot
based test infrastructure with a known kernel and pty wrappers. Kinda hard to
test "ps" or "top" or "watch" at present either.

I have PART of that test infrastructure in the txpect plumbing in
scripts/runtest.sh which do expect style input/output scripts, currently just
used for sh tests via a "shxpect" wrapper, but the general idea is explained in
the comment before the shell function in runtest.sh:

# Simple implementation of "expect" written in shell.

# txpect NAME COMMAND [I/O/E/X/R[OE]string]...
# Run COMMAND and interact with it:
# I send string to input
# OE read exactly this string from stdout or stderr (bare = read+discard line)
#note: non-bare does not read \n unless you include it with O$'blah\n'
# R prefix means O or E is regex match (read line, must contain substring)
# X close stdin/stdout/stderr 

Re: [Toybox] [PATCH] more.c: More stuff, down cursor key scrolls down. Also stuff about less

2024-03-20 Thread Rob Landley
On 3/20/24 00:39, David Seikel wrote:
> On 2024-03-20 05:02:11, Oliver Webb via Toybox wrote:
>> But I have to ask the question "If it's so easy, why isn't it in toybox 
>> yet?" Is it just because
>> other TODO items taking up time, or is it because it's harder to implement 
>> than it seems.
> 
> I might be able to shine some light on that.
> 
> Long ago I wrote boxes, designed to be a starting point for a generic
> editor / pager for toybox.  It included very basic implementations of
> less, more, and several well known text editors, including vi.  All done
> as wrappers around a common core.
> 
> Even after trimming it down to just an example shell line editor, it was
> deemed too big to read.

I read it:

http://lists.landley.net/pipermail/toybox-landley.net/2015-May/023506.html

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] more.c: More stuff, down cursor key scrolls down. Also stuff about less

2024-03-20 Thread Rob Landley
On 3/20/24 00:02, Oliver Webb via Toybox wrote:
> I spotted the more implementation in pending. Looking at it, it's missing 
> quite a lot of stuff, 
> Such as the ability to go back in a file.

More never had the ability to go backwards, less did. Different command.

> It's built in a way where nothing is accumulated, Which 
> means that support for that would require a half-rewrite.

My todo item in more was that it has to wordwrap to the current screen size to
figure out when it's done a line, and that implies UTF8/unicode fontmetrics, and
also implies ANSI escape sequence handling (for at least color changes).

> What does POSIX specify as far as options?

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/more.html

> Looking at the man page, quite a bit. None of which I've
> ever seen used before.
> 
> Looking at the other keybindings GNU more provides which I can implement, 
> There's "=" (prints current
> line number) ":f" (print filename and line), as well as being able to use the 
> down arrow to go down 
> (with the added side effect of any escape key doing so too, not the end of 
> the world, especially
> since we can't scroll up) That are Implemented them in the attached patch.

Again, more and less are not the same command.

> There is also a testing problem. vi.c doesn't do TEST_HOST because it needs a 
> -s option
> to pass in scripts to test with.

Which is an issue I need to figure out how to address. What does a test that
_only_ toybox passes actually prove? (That it hasn't changed since we last
looked at it?)

> less and more are _worse_ since they don't change anything.

They (and vi) care about current terminal size. Automating tests for that
probably involves a PTY wrapper with a pty master side that can do things like
query the current (virtual) cursor position, which I've sometimes sketched out
notes for but not actually tried to write yet. (It's after "get the test suite
running under qemu in mkroot.)

> Manually testing often introduces regressions, so I dunno what the solution 
> is here

I have design ideas. Blogged about them some years ago, I think? (Or maybe wrote
about them in the mailing list web archive, which dreamhost put some rather
large holes in on more than one occasion because dreamhost...)

> This patch improves things for now, but if we are planning on doing a
> future less implementation. 

I have been planning one all along, yes. The crunch_str() stuff I did was a
first pass at general line handling stuff that could be used by less and by
shell line editing and by vi and so on, but people wrote a vi that does not and
never will share code with the rest of those so that's off the table
permanently. I put that on a back burner until I do shell command editing and
history (which needs to know the indentation from the prompt via the ANSI escape
that sends back a current position, but what should it do when the terminal
doesn't respond to that? People can type ahead before the prompt is even written
and the probe reply gets appended to that so it can come in at any point,
there's some redrawing. In bash there's a horrible \[ and \] escape syntax
around "non printing" characters which you can use to manually fudge its
measurement of ANSI escapes, but it still gets it wrong when the previous
command didn't end with a newline ala "echo -n potato" and I want to get it
right...)

Anyway, there's users of this in "top" and in hexedit's ctrl-F input and so on.
Except hexedit's ctrl-f appears to have broken since I wrote it, how nice...

> We could probably merge more into that and make them share a common base 
> either as 2 functions in
> the same file, or just make more a OLDTOY pointing to less like how [ points 
> to [[.

I've done stuff on this and it turns out the two don't share nearly as much
logic as you'd think. A quick grep finds:

https://landley.net/notes-2018.html#10-10-2018

https://landley.net/notes-2017.html#31-12-2017

> less seems moderately easy (Read lines into list, show lines, scroll down and 
> read more lines,
> scroll up and go back in list.)

It seemed easy to me too, until I sat down to try to understand its behavior.

> Especially since we already have vi.c which does almost all the stuff
> less does at it's core, there's a lot of potential code-sharing (To which
> file though, lib/tty.c?).

I have zero input into the vi.c in pending. I was too slow, so it was taken out
of my hands. I pass along changes to it without particularly reviewing them, it
shares no code with any other command in toybox, and trying to clean it up or
promote it has been punted to a post-1.0 item along with implementing "screen"
and similar.

> But I have to ask the question "If it's so easy, why isn't it in toybox yet?" 
> Is it just because
> other TODO items taking up time, or is it because it's harder to implement 
> than it seems.

Because I care about edge cases like ansi escapes and utf8 fontmetrics and
resizing the screen partway through displaying, 

Re: [Toybox] mount option to show what a loopback mount is backed by?

2024-03-19 Thread Rob Landley
On 3/18/24 20:16, enh via Toybox wrote:
> mount currently shows something like:
> 
> /dev/block/loop86 on /apex/com.android.hardware.tetheroffload@1 type ext4
> (ro,dirsync,seclabel,nodev,noatime)
> 
> but often the user wants to know what "loop86" refers to. and it's unlikely 
> they
> know to look in /sys/block/loop*/loop/backing_file.

Toybox doesn't implement "losetup -l" yet, but it does losetup -s on specific
loop devices. I'm assuming the /sys interface is so A) you don't need to be
root, B) it returns more than 64 bytes.

Requiring the loop device to be mounted rather than associated to get info on it
is limiting, but mount providing a convenience function is reasonable...

> afaik, though, there's no coreutils option to show this, so it would involve
> making up a new command-line option as well as some new output...

Or just always hallucinating it into a file=/path/to/blah argument in the
parenthetical, maybe? (With appropriate kernel-style %20 escapes for space and
parentheses and comma and such, I forget which characters the kernel already
escapes...)

Let's see... https://www.kernel.org/doc/Documentation/admin-guide/devices.txt
says block devices with major 7 are loop devices, doesn't limit the minor 
range...

Ok, first attempt (commit c1fb95a3d859) doesn't have escape logic if the path
has weird characters in it, but it should be obvious where to add that if it
comes up. Look reasonable?

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] interesting new (?) env(1) options

2024-03-18 Thread Rob Landley
On 3/17/24 15:10, Ivo van Poorten wrote:
>> In THEORY each man page has a "name" section with a one line
>> description, and there should be a way to emit them all, but if
>> there's a standard way to do it without writing a shell script I
>> dunno what it is. I generally just do
> 
> It looks like apropos is just man -k.
...
> Lots of not so useful output though:
> 
> $ apropos ls | wc -l
> 548
> $ man -k ls | wc -l
> 548

And "man -k ." lists them all. Good to know.

$ man -k . | wc -l
8645

Yeah, lots of debris, but that's a distro issue. No obvious way to limit it by
section either...

$ man -k . | sort -t'(' -k2,2n | less

Eh, sort of reasonable-ish? Assuming you care about allcm, bibdoiadd and
blueman-report...

Thanks,

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] hexdump tests.

2024-03-18 Thread Rob Landley
On 3/15/24 16:24, enh wrote:
> Sure, but that said some tests _DO_ care about the exact amount of 
> whitespace
> (are columns aligned), or tabs vs spaces.
> 
> i know what you mean, but at the same time, i'm struggling to thing of a 
> single
> case i've been involved with where the "upstream" tool hasn't screwed me over 
> by
> doing something stupid sooner or later...

Yup. And yet...

I'm thinking maybe strip _trailing_ whitespace? It's not user-visible and I
can't think of an instance where it's semantically relevant. (LEADING whitespace
is semantically relevant all the time, interstitial a lot too. But trailing
generally shouldn't BE there...)

> CANONICALIZE_SPACE_IF_RUNNING_HOST_VERSION=1? so we trust ourselves but no-one
> else? :-)

I _don't_ trust myself, and I'm not special. (That's policy.)

> The problem is "dump hex" isn't a big enough job that pulling it out into 
> a
> library function that can be shared is really a win. It's another one of 
> those
> "the fighting is so vicious because the stakes are so small" things. 
> Maybe if I
> could genericize the "show hex in 4 digit groups, now do octal!" variants 
> into
> some sort of engine... but I worry that the glue to call the engine would 
> be
> bigger than any savings.
> 
> od and hexdump are weird there in that the former lets you express quite a 
> large
> variety of different dumps, and the latter (i think) pretty much anything. i
> have wondered whether the others can't mostly be written in terms of hexdump.
> (xxd still has all the reverse stuff, but as long as no-one else does, that's
> not duplication.)

Yeah, it _seems_ like there's something I can do there, but I'm tired of being
distracted by it.

> *shrug* Punt that for a potential post-1.0 cleanup pass, and lump it in 
> the
> meantime...
> 
> yeah, like you say, these are some of the simplest commands anyway. i'd be a 
> lot
> more worried if we had four seds or four shells :-)

At the end of my tenure, busybox had FIVE shells, although that last one was my
fault and two of them were the "xkcd standards" problem.

Erik did lash (lame-ass shell) to be tiny, Ash was the bigass lump of complexity
copied out of debian or some such and nailed to the side of the project by that
insane Russian developer who never did learn english and communitcated entirely
through a terrible translator program (so any conversation longer than 2
sentences turned into TL;DR in EITHER direction, he was also hugely territorial
about anybody else touching "his" code), and msh was the minix shell mostly used
on nommu systems.

Somebody then started hush as the "one shell to rule them all" replacement but
work on it petered out. Not sure whose baby that was because the entire busybox
community collapsed at about the same time: Erik Andersen ran a startup and got
so overworked his marriage nearly collapsed, Manuel Nova's girlfriend died,
Glenn McGrath tried a GPL enforcement action down in australia/new zealand and
it left such a bad taste in his mouth he quit open source development entirely,
Mike Frysinger started maintaining seperate for-profit forks of every project he
touched and never pushing anything upstream which eventually resulted in the
blackfin architecture (his dayjob) being declared dead and yanked from
linux/arch and never even making it into qemu... And that's ignoring the whole
uclibc->buildroot saga...

*shrug* Hush dying was pretty minor in context: the busybox community imploded
and I stepped in to prop up what I could until Bruce went "you, volunteer who is
mopping the floors, you're doing it wrong, do it MY WAY, I have _seniority_ and
you've been doing everything in my name all along anyway whether you know it nor
not"...

Anyway, before all that happened I printed out the bash man page into a 3 ring
binder to read on the bus and started my own "one shell to rule them all",
bbsh.c, and work ended on that when bruce chased me off busybox. Denys removed
it pretty early on in his tenure, but as far as I'd gotten was what was checked
in to pending until the current round of shell work started...

> Yes I saw your email in the other thread about pending not being granular
> enough, but didn't really have anything coherent to say in response? I see
> pending as an unfinished todo heap I need to drain, and I feel bad for not
> cleaning it up fast enough. Doing non-cleaning work there is like 
> organizing
> trash piles. Attempting to categorize the bulk wasn't an unambiguous win 
> even
> for toys/ which is _intended_ to keep growing rather than shrink, so 
> adding it
> to pending doesn't appeal. I don't really want spend architectural design 
> cycles
> on scaffolding that gets torn down again.
> 
> indeed.
> 
> i think the only half-way practical idea i had was "keep pending but just 
> switch
> to a much scarier name".

I need to clean it all up. I just haven't quite gotten my groove back
post-pandemic and people 

Re: [Toybox] interesting new (?) env(1) options

2024-03-17 Thread Rob Landley


On 3/15/24 17:48, enh wrote:
> 
> 
> On Fri, Mar 15, 2024 at 3:43 PM Rob Landley  <mailto:r...@landley.net>> wrote:
> 
> On 3/15/24 16:27, enh wrote:
> >     That said, I'm not implementing --longopts without a short opt until
> somebody
> >     comes to me with a use case. And then I would come up with short
> options. Also,
> >     "env" is probably the wrong place to put this unless it's also going
> to sprout
> >     flags to change niceness level and so on: man 7 signal and man 7
> environ are
> >     different pages.
> >
> > i know what you mean, but at the same time, the fact that this swiss 
> army
> knife
> > doesn't actually exist is a bug in its own right.
> 
> A swiss army knife tool is not unix. Unix is "do one thing and do it 
> well", and
> then combine them either with pipes or via:
> 
>   nice -n 20 env -i potato=wombat nohup nsenter -m chroot thingy /bin/blah
> 
> Having a "sigfilt -INT +HUP cat /proc/potato" in a list like that doesn't 
> seem
> like a big lift. (Recycling the "kill" plumbing to understand signal 
> names, and
> the -block +unblock stuff from shell options, maybe with the magic name 
> ALL
> being every signal. Seems like it might also want a taskset/chmod style
> set/unset mask, but I'd want to think about that? Or wait for somebody to
> actually need it, since +ALL and -ALL seem like the default use cases...)
> 
> If you were going to extend any of the existing commands above to fiddle 
> with
> more signals, "nohup" would be the obvious choice, not "env". (Modulo the 
> silly
> stdin/stdout redirection nohup does.)
> 
> > i regularly find people who
> > don't realize which of these things there is/isn't a command for. 
> (because not
> > only are they separate commands, even the man pages don't generally 
> refer
> to one
> > another. because, like you say, in a sense they _don't_ have anything 
> much
> to do
> > with one another.)
> 
> Back in college they had xerox pages with one line summaries of commands, 
> which
> you could "man command" on the machine for.
> 
> 
> i remember when apropros(1) was useful...
> 
> ~$ apropos nice
> nice: nothing appropriate.
> ~$ 
>  
> :-(

Sadly arpopos requires some silly cron job to create a database, but instead of
adapting that for the invention of package management (so the database got
updated when packages were installed/removed) they left it as a cron job, which
bogged down everybody's systems and got shot in the head when laptops overtook
desktops. (Ooh, I wrote a rant about that long ago, I think laptops outsold
desktops in dollar terms in 2004 and unit volume in 2005, but it's been long
enough I don't remember, and that was pre-twitter and before my website blog, so
the rant with actual reference URLs would have been either on livejournal or
some mailing list...)

That said, there's no real reason NOT to update apropos to work based on job
control. (Modulo rebuilding the whole database each package install was
considered too expensive and nobody ever did a delta version.)

In theory "man man" should tell you to look at "man 1 intro", but for some
INSANE reason that still mentions the existence of "info" files (based on the
gopher protocol) and thus 100% irrelevant as of about 1996.

There SHOULD be a tool to list all available commands with a one line summary
the way "aptitude search ." can show you every debian package in the repository
(which is how I know, piping it to wc, there are 74,750 of them in devuan
botulism, yes I still need to upgrade to diptheria).

In THEORY each man page has a "name" section with a one line description, and
there should be a way to emit them all, but if there's a standard way to do it
without writing a shell script I dunno what it is. I generally just do

  ls /usr/share/man/man[0-8]

Although...

$ for i in 1 2 3 4 5 6 7 8; do for j in /usr/share/man/man$i/*; do man $i $j |
grep -A 1 '^NAME$' | tail -n 1; done; done

That sort of almost works except the output is full of:

troff: :330: warning [p 3, 10.8i]: can't break line

warning: file '', around line 636:
  table wider than line width

But it turns out the problem is that the man pages directory is FULL OF CRAP. I
mean seriously, what is:

$ ls /usr/share/man/man*/* | grep beta19
/usr/share/man/man3/_build_libcaca-MxUtPZ_libcaca-0.99.beta19_caca_.3caca.gz
/usr/share/man/man3/_build_libcaca-MxUtPZ_libcaca-0.99.beta19_caca_codec_.3caca.gz
/usr/share/man/man3/_build_libcaca-MxUtPZ_libcaca-0.99.beta19_caca_driver_.3caca.gz
/usr/share/ma

Re: [Toybox] interesting new (?) env(1) options

2024-03-15 Thread Rob Landley
On 3/15/24 16:27, enh wrote:
> That said, I'm not implementing --longopts without a short opt until 
> somebody
> comes to me with a use case. And then I would come up with short options. 
> Also,
> "env" is probably the wrong place to put this unless it's also going to 
> sprout
> flags to change niceness level and so on: man 7 signal and man 7 environ 
> are
> different pages.
> 
> i know what you mean, but at the same time, the fact that this swiss army 
> knife
> doesn't actually exist is a bug in its own right.

A swiss army knife tool is not unix. Unix is "do one thing and do it well", and
then combine them either with pipes or via:

  nice -n 20 env -i potato=wombat nohup nsenter -m chroot thingy /bin/blah

Having a "sigfilt -INT +HUP cat /proc/potato" in a list like that doesn't seem
like a big lift. (Recycling the "kill" plumbing to understand signal names, and
the -block +unblock stuff from shell options, maybe with the magic name ALL
being every signal. Seems like it might also want a taskset/chmod style
set/unset mask, but I'd want to think about that? Or wait for somebody to
actually need it, since +ALL and -ALL seem like the default use cases...)

If you were going to extend any of the existing commands above to fiddle with
more signals, "nohup" would be the obvious choice, not "env". (Modulo the silly
stdin/stdout redirection nohup does.)

> i regularly find people who
> don't realize which of these things there is/isn't a command for. (because not
> only are they separate commands, even the man pages don't generally refer to 
> one
> another. because, like you say, in a sense they _don't_ have anything much to 
> do
> with one another.)

Back in college they had xerox pages with one line summaries of commands, which
you could "man command" on the machine for.

For a while https://man7.org/linux/man-pages/dir_section_1.html and friends were
useful but then he dumped 8 zillion unrelated packages into them, reproducing
Yogi Bera's "nobody goes there anymore, it's too crowded". (I mean seriously,
linking to https://man7.org/linux/man-pages/man1/pmdasolaris.1.html and
https://man7.org/linux/man-pages/man1/ibv_srq_pingpong.1.html in the first page
of results does NOT help anyone do anything, ever.)

My contribution to that was https://landley.net/toybox/help.html and an
intention to do videos (which got hung up on prudetube imploding, but I should
get on with it anyway).

In theory "toybox help -au" provides a similar "one line per command" list, but
that has option usage instead of a brief summary of what the command DOES. There
_is_ such a brief summary for most commands at the top of each source file, ala:

$ sed -ns '1p' toys/*/*.c | head -n 10
/* getenforce.c - Get the current SELinux mode
/* load_policy.c - Load an SELinux policy file
/* log.c - Log to logcat.
/* restorecon.c - Restore default security contexts for files
/* runcon.c - Run command in specified security context
/* sendevent.c - Send Linux input events.
/* setenforce.c - Set the current SELinux mode
/* demo_many_options.c - test more than 32 bits worth of option flags
/* demo_number.c - Expose atolx() and human_readable() for testing.
/* demo_scankey.c - collate incoming ansi escape sequences.

But that doesn't become part of the help text (maybe it should?) and doesn't
cover multiple commands in the same source file. And xzcat.c does it wrong.

Anyway: if a "toybox cheat sheet" seems like a good thing, we're like 80% of the
way there already?

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] interesting new (?) env(1) options

2024-03-15 Thread Rob Landley
On 3/15/24 13:40, enh via Toybox wrote:
> i've never noticed these before:
> 
>        --block-signal[=SIG]
>               block delivery of SIG signal(s) to COMMAND
> 
>        --default-signal[=SIG]
>               reset handling of SIG signal(s) to the default
> 
>        --ignore-signal[=SIG]
>               set handling of SIG signal(s) to do nothing
> 
> i've not yet _needed_ them, but fyi in case anyone does. (the motivating 
> example
> in the man page is "making sure SIGPIPE actually works in a child, regardless 
> of
> the caller's signal disposition".)

In 2011, I had an adventure, the punchline of which is here:

https://landley.net/notes-2011.html#05-09-2011

And the investigation leading up to it was:

https://landley.net/notes-2011.html#24-08-2011
https://landley.net/notes-2011.html#26-08-2011
https://landley.net/notes-2011.html#28-08-2011
https://landley.net/notes-2011.html#01-09-2011
https://landley.net/notes-2011.html#02-09-2011
https://landley.net/notes-2011.html#03-09-2011
https://landley.net/notes-2011.html#04-09-2011

tl;dr: autoconf was hanging in one of its tests, I safari'd through to the
actual failure reproduction sequence then traced it THROUGH the kernel to
eventually find an old bash bug (longjmp instead of siglongjmp) so when bash
took a timeout trap it left SIGALRM blocked, and my PID 1 init script had a
"read -t 3" that was doing that, meaning child processes that script started
inherited the blocked SIGALRM, which didn't cause a problem until halfway
through an autoconf build.

That said, I'm not implementing --longopts without a short opt until somebody
comes to me with a use case. And then I would come up with short options. Also,
"env" is probably the wrong place to put this unless it's also going to sprout
flags to change niceness level and so on: man 7 signal and man 7 environ are
different pages.

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] hexdump tests.

2024-03-15 Thread Rob Landley
On 3/15/24 09:58, enh wrote:
> On Thu, Mar 14, 2024 at 9:25 PM Rob Landley  wrote:
> > I could make ours do that, or I could export NOSPACE=1 at the start o the 
> > test
> > (since each one runs as a child process now meaning environment variable
> > assignments won't persist into other tests)
> 
> or have a NOSPACE that _doesn't_ reset?

NOSPACE already doesn't reset, I've been doing shell prefix assignment to set it
for individual commands, ala:

  $ NOSPACE=1 bash -c 'env | grep NOSPACE'
  NOSPACE=1
  $ echo -n $NOSPACE
  $

If I just NOSPACE=1 on its own line it should apply to all the tests in the
current script, then clean up at the end because each test script runs as a
(child process) with its own environment.

And that's probably correct here because diff -b ignores changes in the AMOUNT
of space but not the EXISTENCE of space, so "a b c" and "abc" still wouldn't
match, which is what I was worried about. (I thought -b yanks space, but instead
it normalizes it.)

> i feel like most of the tools that
> produce human-readable output have this problem otherwise: (a) upstream often
> has a weird duplicative implementation that leads to bizarre behavior and (b)
> they keep changing things (possibly without even noticing, since it's 
> whitespace).

Sure, but that said some tests _DO_ care about the exact amount of whitespace
(are columns aligned), or tabs vs spaces.

> (xxd, which has the `-r` option, seems like the partial exception, but even
> there as long as xxd can consume any [text] input it's likely to be given, i
> don't think it matters exactly what the text output's whitespace looks like?)
>  
> 
> but that weakens the test, or maybe
> I could add a "NOTRAIL=1" that runs the output through sed 's/[ \t]*$//' 
> or some
> such?
> 
> Anyway, reviewing commands to promote them out of pending is a thing, and 
> I'm
> trying close tabs rather than open more just now.
> 
> (i didn't even realize there was a hexdump in pending! missed that somehow...)

It mostly comes up when I complain we've got FOUR hex dumpers that share no code
(it's the fourth, you added the third with xxd, posix od and my hexedit were the
first two), but the recent revisit tipped me over into "alright, I'll clean it
up..."

The problem is "dump hex" isn't a big enough job that pulling it out into a
library function that can be shared is really a win. It's another one of those
"the fighting is so vicious because the stakes are so small" things. Maybe if I
could genericize the "show hex in 4 digit groups, now do octal!" variants into
some sort of engine... but I worry that the glue to call the engine would be
bigger than any savings.

*shrug* Punt that for a potential post-1.0 cleanup pass, and lump it in the
meantime...

> Rob
> 
> P.S. In THEORY the chmod +x bit on tests/*.test is the equivalent of 
> "pending",
> where "make tests" only runs the ones marked executable but you can still 
> run
> them individually/standalone with "make test_commandname". In practice, 
> the
> pending status of commands and the pending status of TESTS isn't a 1-1 
> match up.

Yes I saw your email in the other thread about pending not being granular
enough, but didn't really have anything coherent to say in response? I see
pending as an unfinished todo heap I need to drain, and I feel bad for not
cleaning it up fast enough. Doing non-cleaning work there is like organizing
trash piles. Attempting to categorize the bulk wasn't an unambiguous win even
for toys/ which is _intended_ to keep growing rather than shrink, so adding it
to pending doesn't appeal. I don't really want spend architectural design cycles
on scaffolding that gets torn down again.

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] readelf: fix -n for x86-64 ibt/shstk notes.

2024-03-15 Thread Rob Landley
Applied, but I don't have a local way to reproduce the issue nor did it come
with a test.

On 3/14/24 19:08, enh via Toybox wrote:
> I couldn't work out how to get gcc to actually produce such a thing, but
> /bin/dbxtool on my debian box right now has them.

I don't seem to have that command. What package does dpkg-query -S /bin/dbxtool
say it's in on your system?

Also:

-unsigned namesz=elf_int(), descsz=elf_int(), type=elf_int(), j=0;
+unsigned namesz=elf_int(),descsz=elf_int(),type=elf_int(),j=0;

Is there a reason for that? Spaces after commas is the usual in the codebase, I
think? (Both in declaration lists and function arguments. With the occasional
cheat to fit in 80 columns but that doesn't apply here. I mean, I applied it
anyway, but...?)

And then you add more spaces in:

-while (descsz-j > 8) { // Ignore 0-padding at the end.
+while (descsz - j > 0) {

That one's not a strong thing, but why add the spaces around the minus there?

The "x = y;" spaces around assignment operators got drilled into me back on
busybox, along with the spaces in flow control statements that would otherwise
look like functions, ala if (potato) vs if(potato). And I mentioned the spaces
after commas.

For most of the rest of them, descsz-j>8 works just as well for me? Especially
with syntax highlighting. I lean towards smaller representations where possible,
although "consistent" beats "correct" so I have it do what the nearby code is
doing unless I'm cleaning up a function at a time.

In while (descsz-j > 8) the spaces are separating the math from the comparison
so almost earn their keep, but if you add more spaces it equally looks like
descsz - (j > 0) so I dunno what the spaces are supposed to accomplish?

As I said, not a strong thing. Just curious why the change. At that level it's
more aesthetics than policy, but I wonder if you've got a mental rule I'm 
missing...

> The big mistake here is that GNU property notes' data is always 8-byte
> aligned, so we needed to skip that. That lets us get rid of the existing
> loop termination hack to skip padding.
> 
> While I'm here -- since the symptom was running off the end of the file --
> I've also added a bounds check in the property dumping loop.
> 
> I wish I had fuzzing infrastructure to run AFL++ against this every time
> it changes... In lieu of that I do wonder whether we should add `readelf
> -aW /bin/* > /dev/null` as a smoke test that "at least it works for all
> the _valid_ binaries on the system you're testing on". That would have
> caught this sooner.

At a design level the test suite tries to be self-contained and deterministic.
Even the TEST_HOST option to sanity check the tests' validity against the host
commands behavior is annoyingly made of heisenbugs. (Plus the filesystem you run
it on can change behavior enough to break things, we've had kernel version skew
at least once, and the bsd tests are handwaves at best.)

A test where you honestly don't know what input data you're running it on
wouldn't really belong in "make tests", that would be some other test. Maybe a
scripts/probes/test-readelf or something? (It sort of conceptually fits with
GLOBALS and findglobals. Questions I occasionally want to ask, but have not yet
tried to automate understanding of the results.)

I could see having such a test as part of the automated LFS build, and I could
see running it against the results of an android build. But neither use case
seems a good fit for "cd toybox; make tests" being the delivery mechanism.

Also, if you're gonna do it, traverse $PATH instead of a hardwired /bin. There's
a script snippet to do that in mkroot/record-commands already:

  tr : '\n' <<< "$PATH" | while read i; do
find "$i" -type f,l -maxdepth 1 -executable -exec basename {} \; | \
  while read FILE; do ln -s logpath "$WRAPDIR/$FILE" 2>/dev/null; done
  done

Which now that I look at it should probably have ${FILE##*/} instead of the
calls to basename. (Noticeably faster not to spawn a hundred child processes...)

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


[Toybox] hexdump tests.

2024-03-14 Thread Rob Landley
Following up on commit cab0b6653827, the hexdump test suite is weird. For
example, the first test has "simple\\n" when it means "simple\n" (which
nevertheless somehow works for reasons I am loathe to examine at the moment, and
I have a todo item to convert the various test arguments needing escapes to
$'blah\n' arguments where the shell does it instead of washing it through echo
inside the function, but that is a giant flag day change for some point where
I'm facing a clean tree.)

But the bigger problem is that TEST_HOST doesn't pass because debian's hexdump
is doing something silly:

  $ echo -e 'simple\n' | hexdump | sed 's/$/./'
  000 6973 706d 656c 0a0a.
  008.

I.E. padding short lines with trailing spaces, because of course.

I could make ours do that, or I could export NOSPACE=1 at the start o the test
(since each one runs as a child process now meaning environment variable
assignments won't persist into other tests) but that weakens the test, or maybe
I could add a "NOTRAIL=1" that runs the output through sed 's/[ \t]*$//' or some
such?

Anyway, reviewing commands to promote them out of pending is a thing, and I'm
trying close tabs rather than open more just now.

Rob

P.S. In THEORY the chmod +x bit on tests/*.test is the equivalent of "pending",
where "make tests" only runs the ones marked executable but you can still run
them individually/standalone with "make test_commandname". In practice, the
pending status of commands and the pending status of TESTS isn't a 1-1 match up.
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] toysh: fix -Wuse-after-free

2024-03-14 Thread Rob Landley
On 3/13/24 18:19, Oliver Webb wrote:
>> If you want to complain, I do tend to have "s", "ss", and "sss" as an
> 
> I did complain about that in the email. 

I noticed.

>> You're basically complaining there was an "i" variable that wasn't a loop 
>> index.
> 
> That IS a problem,

Not "I have a problem", but "it IS a problem". Well it's good that you're here
to define objective reality. Without such a universal arbiter we would all be
lost a world of opinion and nuance.

Me, I tend to have a gap between being able to tell "this isn't ideal" and
knowing what ideal would look like. Yelling at me "this is a mess, clean it up,
how top it NOW because I'm in charge of you!" doesn't really help the second 
part.

> and when breaking/mixing the few conventions there are happens _constantly_,
> with local variables that interact with each other. That becomes a much 
> larger problem.

"There are only two hard things in Computer Science: cache invalidation and
naming things." -- Phil Karlton

Do I need to tap the sign?

> "i" is usually a counter used for at most a few dozen lines, "ss" and friends 
> span regularly for
> hundreds of lines.

They didn't span hundreds of lines when I started writing the function. The
function changed over time. The function is not finished yet. Gratuitous
refactoring makes "git annotate" and friends useless.

(I remember once when I was maintaining a tinycc fork, I introduced a subtle bug
that took me days to track down by REINDENTING the code because I screwed up a
curly bracket placement. I do not yet have what I consider a load bearing test
suite for this command.)

>> In multiple places, f stands for function, and there's more than one aspect 
>> to
>> functions.
> 
> I'm sure to someone who wrote 99% of it everything makes so much more sense.

Tron: If you are a User, then everything you've done has been according to a 
plan.

Kevin Flynn: Ha! You wish! Well, you know what it was like. You just keep doin'
what it looks like what you're supposed to be doin', no matter how crazy it 
seems.

> Just like
> the people who maintain GCC and the coreutils and IOCCC entries.

Carefully calculated to flatter me into agreeing with your opinions, I see.

>> > The problem isn't the length as I said, the problem is that there is no 
>> > convention for the naming
>> > of these.
>> 
>> Maybe I should move the file to "pending"?
> 
> This is the most important command in the entire project (you can't _do 
> anything_ on a machine without
> a shell), No other pending command I've ever seen is this in-auditable, even 
> ones written by you.

"Even ones written by" me. How nice.

>> And as I said: if I were to I apply an aesthetic patch which does nothing but
>> make the code smell like you, toysh would be all yours and I would never 
>> touch
>> it again.
> 
> Make it look like whatever you want, I honestly couldn't care.

Oh good, I can stop reading then.

Rob

P.S. Are you aware that you have opinions? That you're not an ordained judge of
what is and is not correct? No, why should I ask questions: you didn't.

To answer some questions you didn't ask: A _better_ name for a variable is not
always immediately obvious to me as I'm writing. The uses of a given variable
tend to change as the function develops. I tend to perform cleanup passes to
make the code more intelligible AFTER it's otherwise working (and has a full
regression test suite it passes so I can catch bugs introduced by refactoring).
Things that are used a LOT tend to have short names in my code, because
BigLongCamelCaseName repeated 400 times in a function does not make code easier
for _me_ to manage or understand, and variable names get out of date the exact
same way comments get out of date. Obviously, your experience differs, and I
look forward to the shell you write from scratch that renders toysh unnecessary.
It should be easy for you, given you can dictate empirical truth in a way that
obviates judgement. Me, I'm muddling through as best I can...
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] watch: flush the buffer each round.

2024-03-13 Thread Rob Landley
On 3/12/24 12:24, enh wrote:
> On Tue, Mar 12, 2024 at 8:45 AM Rob Landley  wrote:
>>
>> Hello from Minneapolis: https://mstdn.jp/@landley/112078501045637288
>>
>> Still _really_ fried from my move, but at least chipping away at the email
>> backlog...
> 
> at least it's the same timezone :-)

That helps less than you'd think. (Flying to Japan and back is a day's recovery
time.)

>> My original approach to FILE * was to just always flush (ala xprintf) but 
>> there
>> were objections to that, which I said would open a can of micromanagement 
>> worms.
>> Then I changed the default buffer type to be unbuffered instead, and there 
>> were
>> objections to that, which I said would open a can of micromanagement worms...
> 
> (well, yeah, the "don't actually flush" argument was weird. but
> there's nothing inherently wrong with "flush and exit on failure", and
> for those places that _do_ have an opinion on when is a good time to
> flush, using that rather than directly using fflush would be
> reasonable?)

I thought xprintf() doing a flush was reasonable, and every single manual
flush() was a wart. Clearly my design sense is inapplicable to this area.

>> > but none of this
>> > is relevant here --- the problem here is that we're not flushing _at
>> > all_, so ToT watch is just broken.
>>
>> Micromanagement, yes.
> 
> not really. there's an obvious point for watch to flush --- "that's
> this round of output done, and i'm an interactive program". no
> heuristic is going to be right for that because it's going to depend
> on stuff like "what's the update frequency? how much attention is the
> user paying? what's the content?". `watch ls -l` with a 5s frequency
> being a second or so behind is fine. `watch date` with a 1s frequency
> not.

Naming things and cache invalidation: when dealing with a famously hard part,
choose a strategy of requiring explicit manual management. Not because the
language requires it, but because we volunteered as a performance optimization.

>> >> This code is mixing dprintf(1, ...) with printf() and fflush(stdout) in a 
>> >> way
>> >> that seems unlikely to end well, and I would like to think it through 
>> >> when I'm
>> >> not moving house. (Or have somebody whose brain isn't jello reassure me 
>> >> that
>> >> it's coherent.)
>> >
>> > not in the ToT copy it isn't. maybe you have local changes?
>>
>> Hmmm, yup looks like I was already fixing this command up to not use FILE * 
>> at
>> all anymore (what is this, my fourth attempt to get away from FILE * 
>> buffering
>> damage in a systematic way?) and got distracted partway through...
> 
> what's the motivation in watch? i don't see any benefit?

I don't remember at the moment (still fried), but probably just gradually
switching to "never use stdio for anything" now that it's buffering?

Back before the default of calling printf() was to produce no output without
remembering to say "simon says", you could freely mix file descriptor writes and
file pointer writes, so I didn't have to CARE if crunch_escape() in lib/ was
calling fputs() or write() behind the scenes.

But now I need to know. Now I need to retroactively go through and retrace every
single output path in every single command, and as with the
dprintf/printf/dprintf example pasted above (whether checked in or not), you
can't just stick a flush in a loop once and expect stuff to go out in the right
order.

I'm aware of the argument "use FILE * for everything, convert even tee to work
based on FILE * instead of filehandles, and then it's all consistent". Which is
the danegeld systemd/windows/apple argument: this tool is so terrible
interoperating with ANYTHING else that it must purge all other tools from its
ecosystem, so bow before it and obey the imperative to cleanse the unbelievers.

If I reacted well to that argument, I wouldn't have wound up on Linux in the
first place...

> stdio with
> explicit flush is _great_ for this use case, where the caller has a
> specific "and i'm done for now" point that _it_ knows.

When start_redraw() calls terminal_probesize() it outputs the probe sequence
through xprintf() but doesn't do a fflush(). (Before the buffer type changed, it
didn't need to. Before xprintf() lost its flush, it didn't need to.) So the
query to see if we got a response or not isn't going to get a response, so the
first screen draw will not take that size query into account when operating
across a serial terminal or qemu console. But there's a fallback path (assuming
80x25) so we may not notice the code being subtly broken.

We will find places the code is subtly br

Re: [Toybox] [PATCH] toysh: fix -Wuse-after-free

2024-03-13 Thread Rob Landley
On 3/8/24 21:22, Oliver Webb via Toybox wrote:
> TL;DR: Rant about sh.c's variable names I forgot to include in main email, I
> have a patch to start fixing it but it conflicts with other stuff and I have
> to re-do it
> 
> Reading through sh.c, most of the variable names are 2 letters long 
> (repeating the same letter),

I switched from single character local variables to double character ones
because they're easier to search for without an elaborate IDE trying to parse
the code.

> for short functions (<20 lines) this isn't a problem. For example, I don't 
> need a more
> descriptive name for the variable "c" in getvar() to understand it because I 
> can look at the uses
> of it and infer from context it's about variable types. Longer functions 
> where there are a lot of variables
> interacting with each other are a lot harder to keep track of.
> 
> For example, what does "cc" do in sh_main?

Hold the -c argument. The first assignment to it is:

cc = TT.sh.c;

> From the looks of it. It's the name
> of command/file being processed, but to _know_ that's the case I have to audit
> 50 lines of code, and check over seeing if that's what it really does.

Variables sometimes get reused to mean slightly different things, and their use
can change over time when a lot of editing is done to a function. Especially for
code that isn't finished yet, there can be unpolished bits.

Although in this case, "-c is held in the variable c but I generally double such
variable names to give vi's text search at least SOME chance of finding it"
seems reasonable to me?

If you want to complain, I do tend to have "s", "ss", and "sss" as an
alternative to "s1", "s2", and "s3" when the long descriptive name for the
variable would otherwise be just "string". That gets a bit awkward at times and
I've looked at renaming some but not come up with better names.

> When a name
> like "in_source", "inputsrc", or just "inname" would tell you that without 
> having to look
> over the code and make sure that's the case.

No, you still have to look at the code. Especially when the code is not
finished, as evidenced by it still being in pending/ and not passing half its
test suite. (Let alone my giant pile of "turn these into proper shell tests",
attached...)

And "c" holding the argument of "-c" is still probably a bad example to base
your rant on. And a command that's _flagrantly_ not finished yet may also be a
bad example...

> The problem isn't the length, the problem is that there isn't any convention 
> to it and a lot
> of names convey little to no information ("ss" is usually a string, but "what 
> does the string do?"
> is a question you need to commonly ask to find out what it's doing). It'd be 
> like if run_lines()
> was called "rr()" or "expand_arg" was called "ea()", we name functions just 
> fine (which makes
> this code somewhat readable, at least it's general structure)

Would you like me to stay out of your way while you take over toysh and refactor
it to your heart's content? I note that I will never touch it again if I hand it
over.

> "ff" is a common name for file descriptors.

In this idiom it's "f, but more searchable".

You're basically complaining there was an "i" variable that wasn't a loop index.

> So working down from sh_main, One could assume that 
> "TT.ff" must have something to do with file management, right?... No, it's a 
> doubly linked list
> of overhead for shell functions,

In this case it sounds like f stands for function, yes. Many words start with f.

> and "struct sh_fcall" is often called "ff" in functions that use
> it, we have 2 common names that do completely different things...

In multiple places, f stands for function, and there's more than one aspect to
functions.

> The problem isn't the length as I said, the problem is that there is no 
> convention for the naming
> of these.

Maybe I should move the file to "pending"?

> No pattern that you can follow, you usually have to go to the start of the 
> function
> (which can be hundreds of lines up, depending on what you are looking 
> through) and see how it gets
> assigned (And interacts with other variables that have the exact same problem)
> 
> I'd try to replace a lot of these names in a patch (I have already done so 
> for sh_main in a patch
> that I didn't send because getting the compiler to shut up was more 
> important, will send the patch
> when I'm sure it won't conflict with anything)

And as I said: if I were to I apply an aesthetic patch which does nothing but
make the code smell like you, toysh would be all yours and I would never touch
it again.

> but to be accurate in renaming I have to understand
> what they are, (The problem I'm trying to fix) which requires either better 
> names to begin with or
> extensive auditing and debug printf-ing of the code that takes up a lot of 
> time.

Clearly I have been too slow to finish this, and it's time for you to take over.
Your aesthetic sense is empirically superior to mine, I 

Re: [Toybox] [PATCH] watch: flush the buffer each round.

2024-03-12 Thread Rob Landley
Hello from Minneapolis: https://mstdn.jp/@landley/112078501045637288

Still _really_ fried from my move, but at least chipping away at the email
backlog...

On 3/11/24 12:27, enh wrote:
> On Fri, Mar 8, 2024 at 5:43 PM Rob Landley  wrote:
>>
>> I remember when the xprintf() family would do a flush and check for errors 
>> each
>> write. That's why code like:
>>
>> dprintf(1, "%c", pad> if (width) xputs(ss+(width>ctimelen ? 0 : width-1));
>> if (yy>=3) dprintf(1, "\r\n");
>>
>> Was allowable.
>>
>> I also remember when we had an xflush() that would catch errors if stdout
>> barfed, instead of calling fflush() and ignoring the return code.
> 
> yeah, i didn't understand why you removed xflush().

The commit comment on 3e0e8c687eee tried to explain: years ago xflush() got
broken to have a "don't actually flush" argument, and thus didn't do what its
name said anymore, so I renamed it to what it did (check errors and nothing
more) and added a manual flush in the places that was passing in the "actual
flush" argument. (So that change shouldn't have made it worse.)

My original approach to FILE * was to just always flush (ala xprintf) but there
were objections to that, which I said would open a can of micromanagement worms.
Then I changed the default buffer type to be unbuffered instead, and there were
objections to that, which I said would open a can of micromanagement worms...

> but none of this
> is relevant here --- the problem here is that we're not flushing _at
> all_, so ToT watch is just broken.

Micromanagement, yes.

>> This code is mixing dprintf(1, ...) with printf() and fflush(stdout) in a way
>> that seems unlikely to end well, and I would like to think it through when 
>> I'm
>> not moving house. (Or have somebody whose brain isn't jello reassure me that
>> it's coherent.)
> 
> not in the ToT copy it isn't. maybe you have local changes?

Hmmm, yup looks like I was already fixing this command up to not use FILE * at
all anymore (what is this, my fourth attempt to get away from FILE * buffering
damage in a systematic way?) and got distracted partway through...

> /tmp/toybox$ grep dprintf toys/other/watch.c
> /tmp/toybox$
> 
>> Ideally, I would wean it entirely off of the conceptually broken FILE * 
>> output
>> nonsense entirely to dprintf() and write(), because it's intentionally doing
>> progressive output and trying to micromanage cache management there is 
>> unending
>> pain. (We need a stdout with the nagle algorithm. Why did they only bother 
>> to do
>> that for network sockets?)
> 
> aye, you keep saying, but that doesn't exist yet, and watch is broken today 
> :-)

The date on the file is February 10th, so I was working on this last month.

Alas, my life's been a bit hectic recently. Trying to shovel out now...

> that said, my previous patch was suboptimal --- there are `continue`s
> in the loop, so the flush makes more sense at the top. also i noticed
> by inspection trying to work out what you meant about mixing stdio and
> direct fd access (i don't use watch myself; this was a reported bug)
> that -b was broken.

My brain is currently trying to filk the Steven Universe song "other friends" to
be a demand about test cases and reproduction sequences, but reading the code I
see the typo.

> so the new patch (attached) is a better fix. but
> that fixes the normal case and -b for me...

Spraying down the code with manual fflush() calls strikes me as a code smell
that means "this will break the next time anybody touches anything".

But I mentioned still being fried from the move, which makes me extra irritable.
(When did U-haul become a scam? That's new since the last time I used them...)

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] toysh: Shut up TEST_HOST, correct 3 test cases

2024-03-08 Thread Rob Landley
On 3/7/24 19:39, Oliver Webb via Toybox wrote:
> Looking at toysh again since the toybox test suite should run under it
> (in mkroot or under a chroot) A problem seems to be that there is no
> return command, which breaks runtest.sh to it's core. Dont know how to add 
> one in yet
> 
> On my version of bash (5.2.26) TEST_HOST fails on 3 test cases,
> and toysh also fails on those cases (Even tho toysh is doing the right
> thing, the same as bash) The attached patch changes the test file
> so that 3 test cases are resolved. And TEST_HOST works

Because Chet changed stuff I asked him about, making bash a moving target.

Breaks test_host on Devuan Bronchitis using bash 5.0.3. Lemme get back to this
one from Minneapolis. (Please poke me if I forget next week, I am completely 
fried.)

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] watch: flush the buffer each round.

2024-03-08 Thread Rob Landley
I remember when the xprintf() family would do a flush and check for errors each
write. That's why code like:

dprintf(1, "%c", padctimelen ? 0 : width-1));
if (yy>=3) dprintf(1, "\r\n");

Was allowable.

I also remember when we had an xflush() that would catch errors if stdout
barfed, instead of calling fflush() and ignoring the return code.

This code is mixing dprintf(1, ...) with printf() and fflush(stdout) in a way
that seems unlikely to end well, and I would like to think it through when I'm
not moving house. (Or have somebody whose brain isn't jello reassure me that
it's coherent.)

Ideally, I would wean it entirely off of the conceptually broken FILE * output
nonsense entirely to dprintf() and write(), because it's intentionally doing
progressive output and trying to micromanage cache management there is unending
pain. (We need a stdout with the nagle algorithm. Why did they only bother to do
that for network sockets?)

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] clear.c: Clear scrollback buffer on non-vte (gnome based) terminals

2024-03-08 Thread Rob Landley
On 3/8/24 00:02, Oliver Webb wrote:
> On Wednesday, March 6th, 2024 at 01:22, Rob Landley  wrote:
>  
>> It's actually "reset" that should yank the tty out of cooked mode, which I
>> believe it does now.
>> 
>> Why we have both "clear" and "reset", I couldn't tell you.
> 
> reset sets tty settings while clear doesn't,

clear -r

*shrug* That ship has long since sailed, just...

> large enough
> difference to probably not make one a OLDTOY pointing at the other.
> Although reset does everything clear does
> 
> Looking at the man page for reset, it does even more historical stuff that
> doesn't matter today, like "-r" which is the same as "reset; echo Terminal 
> type is ${TERM}."
> and "-s" which is "reset; echo TERM=$TERM;"

I'm not sure why it would feel the need to display an environment variable.
Stuff like "tty" is generally performing a system call.

>> > Looking more at ncurses clear, what does it do?
>>
>> Way too much, all of which is historical nonsense.
> 
> ncurses clear does seem to have a -x option to save the scrollback buffer.
> Not too important but it'd be one line of code to implement ("if 
> (!toys.optargs) xputsn("\e[3J");")

echo -ne '\e[3J'

>> Toybox produces ansi escape sequences (well, "man 7 console_codes" really, 
>> but
>> MOSTLY ansi), and OFFICIALLY does not care what the TERM= is set to (or that 
>> it
>> is set at all) as a matter of policy.
> 
>> Toybox assumes Linux.
> 
> What is the oldest version of Linux are we planning to support?

Well, the FAQ says a 7 year support horizon.

  https://landley.net/toybox/faq.html#support_horizon

Musl doesn't work on anything older than... 2.6.26 I think? Nope, he bumped it
to 2.6.39 for some reason:

  https://wiki.musl-libc.org/supported-platforms

Toybox depends on posix-2008 for all the openat() stuff and 2.6.26 was July
2008, so ballpark of "around then, maybe". I generally just say "3.0" out of
habit, but yeah you can probably squeeze back into the 2.6.30's if you try.

Modulo we've also started to use some C11 features and 3.0 came out in July
2011. I don't THINK shoving an older kernel under that would care, but you'd
need to use a newer toolchain to target that older kernel...

> The nommu stuff seems to have only
> been done on 2.4/2.6 kernels from 15 years ago.

Um, no. Not remotely. Lots of embedded devs use older kernels because they're
way SMALLER than the modern bloated nonsense, but we support and regression test
stuff all the time.

I've booted at least linux 6.6 on physical nommu hardware (my turtle board).
Greg Ungerer and Geert Uytterhoeven regularly regression test like four
different nommu targets each release. We just had a long argument about the
Linux Test Project not having a nommu maintainer and various people both saying
"we support nommu" and "we don't really care about LTP bureaucracy", which is a
bit of catch-22...

https://lore.kernel.org/buildroot/90c1ddc1-c608-30fc-d5aa-fdf63c90d...@landley.net/T/#mf452bdcebfc29ab7a9d230275eb593b22d316ee1

(Which was of course cross-posted to like 4 different lists and each archive
only retained posts from people who were subscribed to that particular list...)

> So is "\e[3J" (A escape code added in 3.0)
> something we can use?

A) Toybox has a 7 year support horizon, 2011 was 13 years ago, 13 > 7, so I'd
assume so.

B) Added to what? The kernel's VGA text console (and/or bitmap console)
interpreting a sequence has nothing to do with XFCE's Terminal program
interpreting a sequence which has nothing to do with KDE's terminal interpreting
a sequence which has nothing to do with gnome terminal interpreting a sequence
which has nothing to do with the terminal Rich Felker wrote
(https://git.musl-libc.org/cgit/uuterm/) and there's a dozen others out there.

When was the sequence added to "man 7 console_codes"? There's a git... sigh,
google is finding
https://github.com/mkerrisk/man-pages/blob/master/man4/console_codes.4 but he
handed maintainership off over a year ago, and Google can't find the new guy.
Did I blog about this... yes I did:

https://landley.net/notes-2023.html#25-02-2023
http://lists.landley.net/pipermail/toybox-landley.net/2023-February/029469.html
https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/

And I can't figure out how to annotate thorough that web interface. I could
clone it and see when it was added, but I have to go back to moving house.
(Specifically, lifting boxes into a "pod". We've moved back pickup of the pod
twice now, but need the driveway for a u-haul tomorrow and my flight's scheduled
sunday...)

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


[Toybox] Giving a mkroot talk at Texas LinuxFest April 13.

2024-03-06 Thread Rob Landley
Schedule just went up, 45 minute talk at 10am on the second day:

  https://2024.texaslinuxfest.org/schedule/

I need to practice to see what fits in 45 minutes, but it should be a subset of:

  https://landley.net/talks/mkroot-2023.txt

Rob

(I submitted a proposal last year before finalizing the decision to move out of
Austin, so now I'm flying _back_ to give the talk. Wheee...)
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] clear.c: Clear scrollback buffer on non-vte (gnome based) terminals

2024-03-06 Thread Rob Landley
On 3/5/24 19:16, Oliver Webb via Toybox wrote:
> Taking a quick look at the release notes for hurd, they are starting x86_64 
> and AMD64
> support, Not surprised you've never seen it and neither have I.
> 
> In the 80s and 90s it was probably a lot more relevant then it is now.

No, it really wasn't.

https://landley.net/notes-2010.html#19-07-2010

Rob
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] clear.c: Clear scrollback buffer on non-vte (gnome based) terminals

2024-03-05 Thread Rob Landley
On 3/5/24 18:31, enh via Toybox wrote:
>> We have a 7-10 year support horizon, How many terminal escape protocols have 
>> been relevant
>> in the last 10 years: One. The story is the same for UTF8 and LP64

There's a certain amount of 80/20 going on.

If you 80/20 twice you get 96%. (80% of the remaining 20%.)

If you 80/20 THREE times you get 99.2%.

Diminishing returns kick in real fast. I usually do one round and wait for
people to show up with use cases.

> (to be fair, toybox does support ILP32 too, not just LP64 --- it's
> just weird stuff like Windows' LLP64 that's explicitly out of scope
> aiui.)

They made their bed, explicitly and intentionally:

  https://devblogs.microsoft.com/oldnewthing/20050131-00/?p=36563

And then they made Windows Subsystem for Linux. (Twice, apparently.)

> i honestly still
> don't believe hurd actually exists!

It technically exists, it just doesn't WORK.

> i've never seen it, whereas i've
> actively used all the others you mention, plus the two i just
> mentioned, and Tru64 too. if you want "obscure but definitely a real
> thing", how about Plan 9?)

Eh, that wasn't exactly obscure, it was just tied up in really stupid
AT>Lucent licensing shenanigans until ~Y2K (so that nobody could SEE it
without forking over thousands of dollars, rumors of greatness but nobody had
personal experience), by which point it was about ten years moot. But it got
looted for a bunch of ideas like procfs, and Linux virtfs is a v9fs is 9p2000.L.

https://landley.net/kdocs/ols/2010/ols2010-pages-109-120.pdf

You want obscure, the Bell Labs guys kept doing Unix releases after v7, all the
way through v10, before starting over with Plan 9. They just never got published
outside of the labs (for the same reason Plan9 didn't, the commercialization
drive). They finally got released to the public about the same time Plan 9 did:

https://www.tuhs.org/Archive/Distributions/Research/

By the way, while FIPS 151-2 was in force and posix compliance was a requirement
to qualify for federal procurement contracts, EVERYBODY did a Unix. Apple did a
unix for mac hardware in 1988:

https://en.wikipedia.org/wiki/A/UX

Dell did a Unix:

https://gunkies.org/wiki/Dell_UNIX

Commodore did a unix for the amiga:

https://en.wikipedia.org/wiki/Amiga_Unix

Microsoft lied and claimed that Windows NT was a unix:

https://en.wikipedia.org/wiki/Microsoft_POSIX_subsystem

The "unix wars" got REALLY WEIRD for a while there.

Rob

(P.S. When I say "weird" I mean https://en.wikipedia.org/wiki/Eunice_(software)
which yes, I used as a teenager.)
___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


  1   2   3   4   5   6   7   8   9   10   >