Re: [Toybox] [PATCH] toysh: fix -Wuse-after-free

2024-03-13 Thread Rob Landley
On 3/8/24 21:22, Oliver Webb via Toybox wrote:
> TL;DR: Rant about sh.c's variable names I forgot to include in main email, I
> have a patch to start fixing it but it conflicts with other stuff and I have
> to re-do it
> 
> Reading through sh.c, most of the variable names are 2 letters long 
> (repeating the same letter),

I switched from single character local variables to double character ones
because they're easier to search for without an elaborate IDE trying to parse
the code.

> for short functions (<20 lines) this isn't a problem. For example, I don't 
> need a more
> descriptive name for the variable "c" in getvar() to understand it because I 
> can look at the uses
> of it and infer from context it's about variable types. Longer functions 
> where there are a lot of variables
> interacting with each other are a lot harder to keep track of.
> 
> For example, what does "cc" do in sh_main?

Hold the -c argument. The first assignment to it is:

cc = TT.sh.c;

> From the looks of it. It's the name
> of command/file being processed, but to _know_ that's the case I have to audit
> 50 lines of code, and check over seeing if that's what it really does.

Variables sometimes get reused to mean slightly different things, and their use
can change over time when a lot of editing is done to a function. Especially for
code that isn't finished yet, there can be unpolished bits.

Although in this case, "-c is held in the variable c but I generally double such
variable names to give vi's text search at least SOME chance of finding it"
seems reasonable to me?

If you want to complain, I do tend to have "s", "ss", and "sss" as an
alternative to "s1", "s2", and "s3" when the long descriptive name for the
variable would otherwise be just "string". That gets a bit awkward at times and
I've looked at renaming some but not come up with better names.

> When a name
> like "in_source", "inputsrc", or just "inname" would tell you that without 
> having to look
> over the code and make sure that's the case.

No, you still have to look at the code. Especially when the code is not
finished, as evidenced by it still being in pending/ and not passing half its
test suite. (Let alone my giant pile of "turn these into proper shell tests",
attached...)

And "c" holding the argument of "-c" is still probably a bad example to base
your rant on. And a command that's _flagrantly_ not finished yet may also be a
bad example...

> The problem isn't the length, the problem is that there isn't any convention 
> to it and a lot
> of names convey little to no information ("ss" is usually a string, but "what 
> does the string do?"
> is a question you need to commonly ask to find out what it's doing). It'd be 
> like if run_lines()
> was called "rr()" or "expand_arg" was called "ea()", we name functions just 
> fine (which makes
> this code somewhat readable, at least it's general structure)

Would you like me to stay out of your way while you take over toysh and refactor
it to your heart's content? I note that I will never touch it again if I hand it
over.

> "ff" is a common name for file descriptors.

In this idiom it's "f, but more searchable".

You're basically complaining there was an "i" variable that wasn't a loop index.

> So working down from sh_main, One could assume that 
> "TT.ff" must have something to do with file management, right?... No, it's a 
> doubly linked list
> of overhead for shell functions,

In this case it sounds like f stands for function, yes. Many words start with f.

> and "struct sh_fcall" is often called "ff" in functions that use
> it, we have 2 common names that do completely different things...

In multiple places, f stands for function, and there's more than one aspect to
functions.

> The problem isn't the length as I said, the problem is that there is no 
> convention for the naming
> of these.

Maybe I should move the file to "pending"?

> No pattern that you can follow, you usually have to go to the start of the 
> function
> (which can be hundreds of lines up, depending on what you are looking 
> through) and see how it gets
> assigned (And interacts with other variables that have the exact same problem)
> 
> I'd try to replace a lot of these names in a patch (I have already done so 
> for sh_main in a patch
> that I didn't send because getting the compiler to shut up was more 
> important, will send the patch
> when I'm sure it won't conflict with anything)

And as I said: if I were to I apply an aesthetic patch which does nothing but
make the code smell like you, toysh would be all yours and I would never touch
it again.

> but to be accurate in renaming I have to understand
> what they are, (The problem I'm trying to fix) which requires either better 
> names to begin with or
> extensive auditing and debug printf-ing of the code that takes up a lot of 
> time.

Clearly I have been too slow to finish this, and it's time for you to take over.
Your aesthetic sense is empirically superior to mine, I 

Re: [Toybox] [PATCH] toysh: fix -Wuse-after-free

2024-03-13 Thread Oliver Webb via Toybox
Added P.S.:

If this is territorial, just admit it. I won't bring up the
"is keeping territory a good thing in a open-source project?" argument. I never
mentioned territory because I don't think like that (I've waited for and 
welcomed cleanup passes
on code I'm actively developing). You mentioned territory ("I'll never touch 
this again if other
people are doing stylistic changes on it for auditability"). Like my goal was 
"WORLD DOMINATION!"
instead of "Make this more readable so I and other people can build off of and 
improve this".

And now, a link with no meaning or parallels to this discussion:
https://landley.net/notes-2018.html#06-09-2018
...

I'll leave it here unless you wanna bring up a counterargument, I'd like
to hear reasons why I'm wrong on this.

- Oliver Webb aquahobby...@proton.me

___
Toybox mailing list
Toybox@lists.landley.net
http://lists.landley.net/listinfo.cgi/toybox-landley.net


Re: [Toybox] [PATCH] toysh: fix -Wuse-after-free

2024-03-13 Thread Oliver Webb via Toybox
> On 3/8/24 21:22, Oliver Webb via Toybox wrote:

> > Reading through sh.c, most of the variable names are 2 letters long 
> > (repeating the same letter),
> 
> I switched from single character local variables to double character ones
> because they're easier to search for without an elaborate IDE trying to parse
> the code.

/\ searches for specific words in vim

> If you want to complain, I do tend to have "s", "ss", and "sss" as an

I did complain about that in the email. 

> would otherwise be just "string". 

That be like naming _every_ integer "integer", (No wait, strings are _worse_ 
because they can hold
a arbitrary string of anything)

> > When a name
> > like "in_source", "inputsrc", or just "inname" would tell you that without 
> > having to look
> > over the code and make sure that's the case.
> 
> No, you still have to look at the code. Especially when the code is not
> finished, as evidenced by it still being in pending/ and not passing half its
> test suite. (Let alone my giant pile of "turn these into proper shell tests",
> attached...)

Not if you are skimming over a section, that is mostly unimportant but 
interacts with a certain thing
you are looking for bugs in.

> And "c" holding the argument of "-c" is still probably a bad example to base
> your rant on.

I couldn't get far enough down from sh_main() to see any better examples, sorry.

> And a command that's flagrantly not finished yet may also be a bad example...

This is the only one I've seen which does stuff like this. No other command in 
pending
does stuff like this (At least not one I've looked at).

> > "ff" is a common name for file descriptors.
> 
> In this idiom it's "f, but more searchable".
> 
> You're basically complaining there was an "i" variable that wasn't a loop 
> index.

That IS a problem, and when breaking/mixing the few conventions there are 
happens _constantly_,
with local variables that interact with each other. That becomes a much larger 
problem.
"i" is usually a counter used for at most a few dozen lines, "ss" and friends 
span regularly for
hundreds of lines.

> In multiple places, f stands for function, and there's more than one aspect to
> functions.

I'm sure to someone who wrote 99% of it everything makes so much more sense. 
Just like
the people who maintain GCC and the coreutils and IOCCC entries. Unfortunately, 
I never
got the chance to look at it when it was somewhat auditable. So when I try to 
remove warnings/
fix test cases

> > The problem isn't the length as I said, the problem is that there is no 
> > convention for the naming
> > of these.
> 
> Maybe I should move the file to "pending"?

This is the most important command in the entire project (you can't _do 
anything_ on a machine without
a shell), No other pending command I've ever seen is this in-auditable, even 
ones written by you.
 
> And as I said: if I were to I apply an aesthetic patch which does nothing but
> make the code smell like you, toysh would be all yours and I would never touch
> it again.

Make it look like whatever you want, I honestly couldn't care. I am someone 
who's
looked over the infrastructure and cares about toybox enough to work on and 
learn about
it in my free time. And even binary searched sh.c for a segfault once. And I 
can barely
understand it without taking hours upon hours to extensively look over every 
part because variable
names don't _tell you anything_. The person who is trying to build a minimal 
system for
whatever reason is not gonna sit down for 12 hours to debug-printf their way 
into understanding
it so they can get whatever the bug of the day is fixed.

If you wanna do it all yourself and don't care how obfuscated it is to an 
outside perspective without
the testing and development knowledge gained along the way and internal thought 
process at the time.
I honestly won't complain, anything that gets test cases fixed and missing 
builtins included. I can
treat it like a black box until a changing number of bash features (what _are_ 
we implementing?)
from a changing specification of a 84,000 line shell is added, then refactored 
into a state where it's readable.
And I'll keep my hands off your territory until then.

> Clearly I have been too slow to finish this, and it's time for you to take 
> over.
> Your aesthetic sense is empirically superior to mine,

(Because I want variable names that say something, and you want ones that say 
nothing because
"less bytes" and "sh.c is my territory and I don't expect anyone else to even 
look at it until it's done
because clearly I don't need any help"?)

> I stand chastized.

I'm sorry I offended you, but this code is easily the hardest code in the 
project to audit.
I've never had issues figuring out how the other code works, this is the 
exception, _not_ the norm.
Part of this is behemoth functions that set out to implement entire subsystems, 
which allows for
spaghetti code, but a lot of this is also variable names saying _nothing_.

You want 

Re: [Toybox] [PATCH] watch: flush the buffer each round.

2024-03-13 Thread Rob Landley
On 3/12/24 12:24, enh wrote:
> On Tue, Mar 12, 2024 at 8:45 AM Rob Landley  wrote:
>>
>> Hello from Minneapolis: https://mstdn.jp/@landley/112078501045637288
>>
>> Still _really_ fried from my move, but at least chipping away at the email
>> backlog...
> 
> at least it's the same timezone :-)

That helps less than you'd think. (Flying to Japan and back is a day's recovery
time.)

>> My original approach to FILE * was to just always flush (ala xprintf) but 
>> there
>> were objections to that, which I said would open a can of micromanagement 
>> worms.
>> Then I changed the default buffer type to be unbuffered instead, and there 
>> were
>> objections to that, which I said would open a can of micromanagement worms...
> 
> (well, yeah, the "don't actually flush" argument was weird. but
> there's nothing inherently wrong with "flush and exit on failure", and
> for those places that _do_ have an opinion on when is a good time to
> flush, using that rather than directly using fflush would be
> reasonable?)

I thought xprintf() doing a flush was reasonable, and every single manual
flush() was a wart. Clearly my design sense is inapplicable to this area.

>> > but none of this
>> > is relevant here --- the problem here is that we're not flushing _at
>> > all_, so ToT watch is just broken.
>>
>> Micromanagement, yes.
> 
> not really. there's an obvious point for watch to flush --- "that's
> this round of output done, and i'm an interactive program". no
> heuristic is going to be right for that because it's going to depend
> on stuff like "what's the update frequency? how much attention is the
> user paying? what's the content?". `watch ls -l` with a 5s frequency
> being a second or so behind is fine. `watch date` with a 1s frequency
> not.

Naming things and cache invalidation: when dealing with a famously hard part,
choose a strategy of requiring explicit manual management. Not because the
language requires it, but because we volunteered as a performance optimization.

>> >> This code is mixing dprintf(1, ...) with printf() and fflush(stdout) in a 
>> >> way
>> >> that seems unlikely to end well, and I would like to think it through 
>> >> when I'm
>> >> not moving house. (Or have somebody whose brain isn't jello reassure me 
>> >> that
>> >> it's coherent.)
>> >
>> > not in the ToT copy it isn't. maybe you have local changes?
>>
>> Hmmm, yup looks like I was already fixing this command up to not use FILE * 
>> at
>> all anymore (what is this, my fourth attempt to get away from FILE * 
>> buffering
>> damage in a systematic way?) and got distracted partway through...
> 
> what's the motivation in watch? i don't see any benefit?

I don't remember at the moment (still fried), but probably just gradually
switching to "never use stdio for anything" now that it's buffering?

Back before the default of calling printf() was to produce no output without
remembering to say "simon says", you could freely mix file descriptor writes and
file pointer writes, so I didn't have to CARE if crunch_escape() in lib/ was
calling fputs() or write() behind the scenes.

But now I need to know. Now I need to retroactively go through and retrace every
single output path in every single command, and as with the
dprintf/printf/dprintf example pasted above (whether checked in or not), you
can't just stick a flush in a loop once and expect stuff to go out in the right
order.

I'm aware of the argument "use FILE * for everything, convert even tee to work
based on FILE * instead of filehandles, and then it's all consistent". Which is
the danegeld systemd/windows/apple argument: this tool is so terrible
interoperating with ANYTHING else that it must purge all other tools from its
ecosystem, so bow before it and obey the imperative to cleanse the unbelievers.

If I reacted well to that argument, I wouldn't have wound up on Linux in the
first place...

> stdio with
> explicit flush is _great_ for this use case, where the caller has a
> specific "and i'm done for now" point that _it_ knows.

When start_redraw() calls terminal_probesize() it outputs the probe sequence
through xprintf() but doesn't do a fflush(). (Before the buffer type changed, it
didn't need to. Before xprintf() lost its flush, it didn't need to.) So the
query to see if we got a response or not isn't going to get a response, so the
first screen draw will not take that size query into account when operating
across a serial terminal or qemu console. But there's a fallback path (assuming
80x25) so we may not notice the code being subtly broken.

We will find places the code is subtly broken by this for _years_...

>> > so the new patch (attached) is a better fix. but
>> > that fixes the normal case and -b for me...

I deleted my local changes and applied it. Pity, I'd improved the help text I
thought...

>> Spraying down the code with manual fflush() calls strikes me as a code smell
>> that means "this will break the next time anybody touches anything".
> 
> in general, yes. but (as