Re: RFC related to reading the script from an arbitrary file descriptor

Thorsten Glaser Mon, 03 Jun 2013 12:04:05 -0700

Ciprian Dorin Craciun dixit:

>    (A) Imagine a tool that (from one reason or the other) stores
>snippets of scripts (and various data files) in a database file (like
>SQLite or CDB), which could be invoked as (`do scriptlet
>some-arguments`).


I’d do that with temporary files anyway… file descriptors are
fragile, and the shell uses some for its own, too.

>    (B) An early initrd boot system that doesn't have `/proc` and
>`/dev` mounted, or even a writable `/tmp`, and which cleans every
>trace of its initial contents from rootfs, replaces it with another
>rootfs, and then instead of `init` executes a script that was in the
>initial initrd.  (Thus a kind of high level Linux "chainloader",
>usable in cloud environments.)

mksh requires a writable ${TMPDIR:-/tmp} for its operation anyway.
Otherwise, several things will not work and cause abnormal behaviour.

>    * (preferred) in case (A) the launcher creates a pipe, forces the
>OS to allocate for it a large enough buffer (up to 1MiB is allowed in
>Linux), writes the data there, closes the write side, and `execve`'s
>mksh;

I’m not too sure this would work for all cases either… and it’s
totally nonportable of course ;-) but let’s leave that point aside.

On the other hand, that case could easily use /dev/fd with -c and dot.
It would be hidden from the user, too.

>(thus the script doesn't need to exist at all nowhere on the
>file system;)

That’s called Linux tmpfs ;) that’s even cheaper.

>    * in case (A) we could use `/tmp` (of course based on `tmpfs` to
>reduce disk overhead), open the file, write to it, unlink-it (thus a
>dangling inode until we close it), then give the file descriptor to
>`mksh` via `/dev/fd/x`;  however this technique is prone to various

That’s not nice. What you can do instead is to prepend a line to
rm the script to the script itself. Since you’re writing it anyway
that’s safer.

>    As I and you have observed, all these can be solved without
>touching `mksh` code, however the solution seems "almost" right.  To

Hmmm. I am not really convinced either way.

>be fair I don't know of any other interpreter that takes an argument
>denoting a file descriptor (without the `/dev/fd/x` trick).  Moreover

Yeah…

>my use-cases are very specific and obscure to justify this new
>feature.  However I think it could be a nice to have feature if the
>implementation isn't too costly.

It looks pretty costly though. Adding a new Source type isn’t as cheap
as it looks at first. And selecting the right file descriptor number to
use is almost impossible (0-9 are allowed to be used by the scripts; in
mksh, even more actually, but there’s a compile-time option to reduce
them a bit)…

In case (B) since you control the original script, it may be possible
to mount a tmpfs somewhere… not too sure… but this looks like it needs
some amount of coordination anyway, so you won’t run arbitrary scripts.
(Do note that pid 1 is required to be init and “special”, so this will
be hell to pay anyway – for example, pid 1 is *required* to wait() for
all unowned children…)

>    The problem is that some scripts use `$0` to change their
>behaviour, thus they would break if they are run with the `-s`
>feature.

Right, but case (A) can use the -c variant easily… and, as I just
wrote, case (B) probably won’t run arbitrary scripts.

>    I know that these options are shared with `set`, however from the
>man-page there doesn't seem to exist a `-S` flag even there.

Right.

>could break future compatibility.  However there could be a
>long-option only such as `--script-fd` which is highly unlikely to
>conflict with something else.

There are no so-called GNU long options in Unix.

>    The main problem is that it is quite cryptic in what it actually
>does.  Thus a person reading it would wonder what happens there, and

I don’t think so. I think (and we had this discussion in Debian
recently) that people who don’t understand that part shouldn’t
be messing with it anyway. (Besides, it’s relatively easy to
comment.)

>    The second problem is that they rely on `/dev`, `/proc`, and
>`/tmp` being mounted.

/proc and /tmp; /dev is not needed (. /proc/self/fd/100 would work),
and you need ${TMPDIR:-/tmp} already anyway. (Most things currently
work without it, but some don’t, and there will be more that don’t
in the future, since temporary files are about the only mechanism
outside of in-process memory that can be used for some purposes.
Here documents have never worked without it, FWIW.)

The paths can be arbitrary, nothing in mksh hardcodes /tmp, and /proc
is Linux-only anyway. But since you’re on Linux already anyway, it’s
*much* cheaper (on the OS as whole) to just write a file into tmpfs,
prepend a line rm’ing it, and tell mksh to read that.

>    About this, I think it's a potential source of bugs and strange
>errors...  I would have personally preferred that the shell `dup`'s
>the stdin descriptor and read from there, then open `/dev/null` on
>stdin so that the script doesn't misbehave and is free to replace

Yes, but there’s evidence towards…

>stdin without breaking anything.  (I don't know what POSIX has to say
>in this respect, but `bash` behaves just like `mksh`, thus I suspect
>it's a conscious design.)

… this – even OpenSSH has a flag to avoid ever reading from stdin, and
there’s more scenarios (cat hosts | while read x; do ssh $x foo; done)
that won’t be caught by it either.

[ extended exec builtin ]
>    The extension could be useful, I've found once or twice a case
>where this could be useful (mainly it involved chain-loading another
>script from within a script.)

OK.

>    Why would this extension require `/dev/fd`?

No, it wouldn’t; /dev/fd is unportable.

>    There are two problems with `tmpfs` that I don't like:
>
>    (A) Its a security problem, because it must be handled with care.

There’s no reason it can’t be chmod 0700…

>    (C) Relying on a writable FS doesn't always work in early boot of
>a VM or container.

… which is not a problem if you have root, as…

>    (B) Many distributions still don't mount `tmpfs` by default, thus
>the disk is involved.

… is this. But if you’re nōn-root, this *would* be an issue…

… except I’ve recently read about precisely this, in the discussion
about Debian using tmpfs for /tmp by default (the idiots won, of
course ☹). The Linux kernel does not, ever, write to the disc, at
all, short-lived temporary files, unless some sort of sync is in‐
volved. (The directory may be affected.) So, if you remove the
script immediately, no harm done.

Furthermore, on glibc-based systems, /dev/shm can be (ab)used…
although people don’t seem to like this, and that one is required
to be a (rather small) tmpfs for some glibc semantics nobody has
bothered to explain to me to date.

Another random thought… did you consider -c 'eval "$(cat <&5)"' ?
I’m willing to extend either $(<…) to accept $(<&5)¹ or x=<<EOF
to accept x=<&5 to get the whole content of the fd as string or
assigned to a variable.

① To not fork. Although, cat is a built-in command in mksh, so
  $(cat <&5) probably will be as cheap *and* require no further
  code changes; would need to analyse that.

And another thought… you can just pass the *whole script* with
the -c option. I’ve got one script where I do something like:

while read host extrastuff; do
        do-lots-with-extrastuff
        ssh host mksh -c '
                foo
                bar
                baz
                '"$blah"'
                blargh
            '
        echo done.
done <files

In it, the amount of stuff passed to mksh is rather large.
(I think there’s one more layer of quoting involved due to
ssh, but you get the idea. Basically mksh -c "$scripttext".)

The idea behind those last two ideas is that, if you can
let mksh run the script, you can slurp it into (shell) memory
already and pass it on. (The last one will work best when the
shell invoking mksh is relatively modern, ideally mksh too.)

Neither of these requires tmpfile stuff; although, $(…) usually
forks once. ${ …;} requires a tmpfile, and ${|…;} does not have
the semantics required for this trick (we can just run everything
usable in it directly).


Sorry for taking a few days in total for replying on this,
but the topic IMHO requires careful thought and discussion.
Please do not be disappointed for taking longer to answer
despite other traffic going on, on the list or in CVS.

bye,
//mirabilos
-- 
<mirabilos>    │ untested
<Natureshadow> │ tut natürlich
<Natureshadow> │ was auch sonst ...
<mirabilos>    │ fĳn ☺

Re: RFC related to reading the script from an arbitrary file descriptor

Reply via email to