On 09/12/2014 10:02 PM, Thorsten Glaser wrote:
tl;dr: We probably should simplify the code (no promises
about $RANDOM other than its value area) and not export
$RANDOM any more, and only use arc4random-related functions
where really convenient, “lesser” OSes are SOL. We move
the task to get better random numbers on the script writers
(and provide a sample implementation already), except on
MirBSD, where it is convenient ☺ (but also probably only
useful to replace dice rolls to decide on what to have
for lunch that day).
Lorenzo dixit:
I was wondering if there would be any trouble replacing the lcg with a
generator whose state is __made__ of n>1 words - something like, off the top of
my head, xor128, JKISS, you name it.
Something I really want is a sponge construct, like Keccak, but
one where you can constantly write to and read from.
I really wish I could tell you something meaningful, but all I know is
that it won sha3... iirc comments on keckak I can't imagine how you
think chacha20 is complex and sponge functions aren't ;p but maybe those
comments were wrong, I *really* don't know (and didn't look into sponge
functions because of them).
Since this is out of the scope of mksh, I’m somewhat tempted to
remove the export and bring back the state we once had:
• if arc4random() exists on the system, always use that
• if arc4random() does not exist, just use an LCG with few
extra seeding (stack address, etc.) or none at all (since
the OSes without it likely also don’t have ASLR and so)
This has several downsides:
• we used to have set ±o arc4random
‣ one variant: constant, to see which one is in use
⇒ led to the LCG codepath being untested/buggy
‣ other variant: can enable/disable arc4random
⇒ code bloat, not much benefit
• if using arc4random(), we use arc4random_addrandom() to
feed assignments to $RANDOM back, but OpenBSD removed
the interface…
• we used to ship one, which wrote to /dev/urandom to feed
back to the kernel, but the Gentoo Linux people didn’t
like that
Found the patch, no idea what bothered them; I guess they just didn't
know you can safely write to /dev/random and urandom on linux.
Since they complain, an even simpler implementation is a NOP :)
• using arc4random() on systems where it’s not in libc will
make it the packager’s choice, which we don’t like much
But we can just bite the bullet here and say “we use arc4random
if you have it, and otherwise you get something that always
produces the same output sequence for the same assignment to
RANDOM, and it’s your fault”. This a̲l̲s̲o̲ has a downside, namely
people expecting deterministic output again, if they program
and test on a “lesser” OS (one without arc4random), and we just
use arc4random_pushb_fast() macro on MirBSD for pushback, and
if it doesn’t exist (OpenBSD, Linux – also sorta lesser OSes)
there is no pushback…
"Remove arc4random_stir() and arc4random_addrandom(), which none should
be using directly. Well, a few rare people cloned it upstream and it
will take a bit of time for them to learn.
ok various"
Do you know what's linux does better than openbsd? rationale for commits
goes in the commit message rather than just being mentioned...
As for “better” LCG-ish things: nah, it’s either cryptographically
sound (aRC4) or speed/convenience.
ACK.
I think we’d be best off with an mksh not promising anything
about the quality of its $RANDOM, and using arc4random() only
where it is really convenient (e.g. on OpenBSD and MirBSD, the
libc malloc() uses it already anyway). We kinda don’t promise
anything in the manpage already… and it would shrink the code.
I'll look into the cvs equivalent of "git log -p" when I have the time,
but a glance at mksh's history suggests that it used to have
arc4random.c - was it really that painful to port?
Btw, agreed, arc4random should be everywhere.
Right. But I have a pure shell implementation for when people
really want it… example use:
https://www.teckids.org/gitweb/?p=verein.git;a=blob;f=util/projrand;h=80f3210cf77314c630086def1062958a528414ab;hb=HEAD
(Watch that space. I also already have the idea to add the
timing of the “Glücksfee” (person doing the lottery drawing
by hitting Return occasionally) to the arcfour state…)
I didn't know urandom in linux had these kind of problems, but
It does…
Any reference? I knew about the insanity of spitting out "random" bytes
before being fully seeded, and I know (I compared myself) that it's
slower than other implementations, but what's the problem with getting
random bytes out of /dev/urandom?
mksh -c for i in $(seq 1000) ; do head /dev/urandom||exit; done' | wc -c
runs in just above one second here (on linux) and outputs ~2.44MB;
/proc/sys/kernel/random/entropy_avail outputs 912 after running the
above a few times - did I miss anything?
modern arc4random uses chacha20, which only requires 16/32 bytes for
its keys.
N̲O̲T̲ “modern” but “OpenBSD’s latest”. This is not “modern”,
it just follows a worrisome trend – not only is DJB’s code
basically illegible (coding style) and incomprehensible to
non-mathematicians, but also unlicenced software and thus
violating http://www.openbsd.org/policy.html (especially
the last two paragraphs), as it both is not ultimately clear
whether DJB’s code is really in PD in the USA, and it most
certainly is not in PD in most other countries.
Except for the trend about complex code (I'm running systemd right now!)
I have to disagree on just about everything.
= Worrisome trend
Look at the papers describing chacha20/salsa20 (chacha20 is described as
"salsa20 with these changes"); I'm not a mathematician, but the papers
are __really__ readable, they do a lot to explain the design decisions
and just about any developer could write a C implementation of chacha
after reading them - seriously, try it, he literally describes how to
implement the algorithm bottom up, and I bet you'll end up with code
that strongly resembles
http://cr.yp.to/streamciphers/timings/estreambench/submissions/salsa20/chacha8/ref/chacha.c
(notice that the api was chosen by the eSTREAM competition).
The code used in openbsd's chacha is basically the above with only two
changes:
1) it's optimized (by djb) in the __obvious__ way, ie he replaces "u32
x[16],input[16]" with "u32 x0,x1,..." - the speed difference is much
bigger than I thought!
You pay it with more lines, but they are boringly obvious...
2) they (openbsd) added "#ifndef KEYSTREAM_ONLY" - the original xors the
plaintext, and gives out the keystream by xoring with zero; since
theyonly want the keystream...
Also some of the choices are so insanely good that even I can appreciate
them - eg remember how one of rc4's weak points is the lousy key
schedule, and you need to skip n*256 bytes? Compare chacha20's key
schedule and make sure your jaw doesn't drop...
Don't even get me started on security ;)
= Unlicensed software
Look at
http://openbsd.cs.toronto.edu/cgi-bin/cvsweb/src/lib/libc/crypt/chacha_private.h,
/*
chacha-merged.c version 20080118
D. J. Bernstein
Public domain.
*/
I know that djb used to screw up licenses in a glamorous way, but
luckily he changed his mind a few years ago, see
http://cr.yp.to/publicdomain.html and http://cr.yp.to/distributors.html
- he changed the old troublesome "license" and got at least a copyright
lawyer insisting he didn't to admit that he was wrong :)
(DJB could fix that easily – others do – but refuses to
even acknowledge the problem we non-US-Americans have.
But then, his track record wrt. software licencing is
pretty… dirty.)
Anyway, I do not discuss upstream things like this in the
Debian bugtracker, as this is most definitely not a bug
in the package.
You're totally right, sending emails is too easy - sorry :)
Thanks for agreeing, and sorry for taking a bit to respond.
bye,
//mirabilos
Np, good answers are better than fast answers (I'm often guilty of
answering too fast and regret it later).