Cryptography-Digest Digest #305

Digestifier Mon, 11 Dec 2000 02:09:00 -0800
Cryptography-Digest Digest #305, Volume #13      Mon, 11 Dec 00 05:13:01 EST

Contents:
  Re: [Question] Generation of random keys - LONG - algorithms, randomness  ("John A. 
Malley")
  Re: [Question] Generation of random keys - LONG - algorithms, randomness  and 
unpredictability (Eric Lee Green)
  Re: [Question] Generation of random keys - LONG - algorithms,  ("John A. Malley")
  Re: A challenge ("Jakob Jonsson")
  3D-Now any good? ("Michael Brown")
  Re: Revised cipher ("Michael Brown")
  Re: breaking rsa knowing the original text ("Michael Brown")
  Re: Modular Arithmetic ("Michael Brown")
  Re: [Question] Generation of random keys (Benjamin Goldberg)
  Re: EDDF: the intended audience (Benjamin Goldberg)
  Re: Array shuffling (Benjamin Goldberg)
  Re: Array shuffling (Benjamin Goldberg)

----------------------------------------------------------------------------

From: "John A. Malley" <[EMAIL PROTECTED]>
Subject: Re: [Question] Generation of random keys - LONG - algorithms, randomness 
Date: Sun, 10 Dec 2000 22:29:49 -0800

David Schwartz wrote:
> 
> Bryan Olson wrote:
> 
> > Malley stated that he'd use "randomness" to mean more than
> > one outcome has non-zero probability.
> 
>         This is the weakest definition of "randomness" I've ever seen. It also
> has an ambiguity problem, because probability has to be computed with
> respect to some set of knowledge.

The post explaining the non-existence of an algorithm R to generate
random outputs from itself used a very simple definition for a random
output based on probability theory. The following definitions and
explanation hopefully remove any ambiguity. 

An experiment is a procedure that yields one of a given set of outcomes.
The individual possible outcomes are called simple events. The set of
all possible outcomes is called the sample space S.  Let the number of
possible outcomes be finite. Let each possible outcome be labeled s_1,
s_2, s_3 ... s_n where n is the total number of possible outcomes. 
There is no way to predict with certainty what the outcome of a
particular experiment will be when it's conducted. Sometimes the outcome
of the experiment is s_1, sometimes it's s_2, sometimes it's s_3 and so
on. 

The outcome of an experiment is random.

A probability distribution P on the sample space S is a sequence of
non-negative numbers p_1, p_2, p_3, ... p_n  whose sum is equal to 1.
The number p_i is interpreted as being the probability of of s_i being
the outcome of a particular experiment. 

An event E is a subset of the sample space S. The probability that event
E occurs, denoted P(E), is the sum of the probabilities p_i of all
simple events s_i which belong to E. 

(All of these definitions - an experiment, simple events, sample space,
probability distribution - come straight from the "Handbook of Applied
Cryptography", Chapter 2, Section 2.1, available at
http://cacr.math.uwaterloo.ca/hac/ )

Let's return to the post describing the hypothesized algorithm R that
generates random output by itself and explains why the existence of such
an algorithm contradicts the very definition of an algorithm  - and
hence R cannot exist. 

The post hypothesized an algorithm R that generated random output by
itself. The algorithm R is an experiment. When R is run on the input I
it yields two possible outputs, R_01 and R_02 denoted as the two cases
R_o1 and R_o2.  These two cases are the two possible outcomes of the
experiment (running the algorithm R on input I.)  It's assumed that for
algorithm R it's not possible to predict with certainty before
conducting the experiment (running the algorithm R on input I) which of
the two outcomes would result (case R_o1 or case R_o2.) That's why the
hypothesized algorithm R produces random output.

(A note here - the hypothesized algorithm R MUST generate at least two
different outputs from the same input if it generates random outputs. 
Why?  Suppose for every input I_i that R runs on it generates the output
O_i and every time R runs on I_i it outputs O_i. Then R is
deterministic. )   

There is some probability p_o1 associated with case R_o1 and probability
p_o2 associated with case R_o2 such that p_o1, p_o2 are both
non-negative and p_o1 + p_o2 sum to 1.  The only two events are the two
cases R_o1 and R_o2. 

The key point of the previous post is the fact that running the
hypothesized algorithm R over and over on the same input I (i.e.
multiple experiments) such that two different outcomes are seen (case
R_o1 and case R_o2) means that at least one step in the hypothesized
algorithm R must have differed during its execution in case R_o1 verses
its execution in case R_o2 to result in two different end states when
the SAME algorithm R halts. That means the SAME steps were followed in
BOTH cases but the end results are different.  Therefore at least one
step in the hypothesized algorithm R must not be definite - BUT - every
step in a algorithm is definite. That's part of what makes an algorithm
an algorithm (check out the cited sources and references in the post.
Read Knuth, "The Art of Computer Programming" vol. 1 for example.) So
the hypothesized algorithm R cannot be an algorithm when it generates
random outputs, from itself, as an algorithm!  Given this contradiction
the hypothesized algorithm R cannot exist. 

Hope this helps,

John A. Malley
[EMAIL PROTECTED]

------------------------------

From: [EMAIL PROTECTED] (Eric Lee Green)
Subject: Re: [Question] Generation of random keys - LONG - algorithms, randomness  and 
unpredictability
Reply-To: [EMAIL PROTECTED]
Date: Mon, 11 Dec 2000 06:32:16 GMT

On Sun, 10 Dec 2000 11:36:35 -0800, John A. Malley <> wrote:
>Actually the situation is the other way 'round - 
>
>1) An algorithm cannot produce randomness.
> 
>2) An algorithm can produce unpredictability.

I believe that you have reversed the definitions for randomness
and unpredictability here.

Randomness has a statistical meaning: a set of numbers that comprise a
random distribution. There exist statistical tests for whether
something comprises a random distribution or not. Randomness is NOT
the same thing as unpredictability.  For example, if I know the
previous output of:

return ((next = next * 1103515245 + 12345) % ((u_int)RAND_MAX + 1));

then I can predict, with 100% accuracy, what the next output of this
mathematical algorithm will be. Yet this snippet of code, pulled out of
the OpenBSD source 'rand.c', will produce a distribution that passes
most known tests for randomness. While failing utterly at the goal of
producing something unpredictable.

Unpredictable means that I cannot guess the next output. I should not
be able to guess (or compute) the next output given that I know the
previous output, and if I intercept outputs seperated by a few, I
should not be able to guess (or compute) the previous output. The most
important part of being unpredictable is to gather enough entropy to
create an unpredictable initial starting state (and source of future
unpredictability) for the algorithm -- events whose magnitude and
occurance cannot themselves be easily controlled, measured or
predicted by the attacker.  For all computer algorithms, if I know the
starting state and all inputs, I can perfectly predict the
output. This is inherent in the beast. Even the best producers of
unpredictable numbers are worthless if an attacker can control the
starting state and inputs.

The goal, then, when producing a good cryptographic "random number
generator", is to both produce a random distribution (if your
generator produces only a small subset, the attacker can focus on that
small subset), and an unpredictable distribution (a distribution where
analysis of an output or set of outputs will not allow prediction of
prior or future outputs). The two are not the same -- as my snippet of
code from rand.c should explain (a producer of random distributions,
but NOT a producer of unpredictable numbers). Computers are quite good
at doing the former -- random distributions. Computers are lousy at
doing the latter -- unpredictable distributions. 

The biggest challenge in cryptosystems today is coming up with a
suitably large set of unpredictable events to seed the generator with
-- and to have a generator whose output is in itself difficult to
predict (something which seems to be based on abuse of cryptographic
components such as ciphers and cryptographic checksums lately). This
is, alas, the primary failing of Ocotillo (the PRNG that I wrote for
Unix platforms that do not have a /dev/random device)... Unix, by
design, makes it very hard to intercept low-level entropic events such
as timer ticks and hard drive interrupts, unless you're operating on
the device driver level (like /dev/random on Linux and FreeBSD).  If
you have 'root' access, it is possible to narrow down considerably the
the range of prior and future outputs of Ocotillo... though, since the
whole point of the PRNG in my particular application is for generating
keys to prevent people from GETTING root access, that's not too big a
problem -- for my particular application. For other applications,
obviously it could be.

-- 
Eric Lee Green      There is No Conspiracy
[EMAIL PROTECTED]     http://www.badtux.org  

------------------------------

From: "John A. Malley" <[EMAIL PROTECTED]>
Subject: Re: [Question] Generation of random keys - LONG - algorithms, 
Date: Sun, 10 Dec 2000 23:37:18 -0800


Eric Lee Green wrote:
> 
> On Sun, 10 Dec 2000 11:36:35 -0800, John A. Malley <> wrote:
> >Actually the situation is the other way 'round -
> >
> >1) An algorithm cannot produce randomness.
> >
> >2) An algorithm can produce unpredictability.
> 
> I believe that you have reversed the definitions for randomness
> and unpredictability here.
> 

Item 1 means there is no algorithm that generates random output by
itself (i.e. given input I the hypothetical random algorithm R generates
output 0_1 or output 0_2.   Each run of algorithm R on the same input I
is an experiment where one outcome is 0_1 and the other is 0_2, and one
cannot specify ahead of time exactly which outcome will occur -
therefore R is not deterministic, it's random.)  There is no such R. 

Item 2 means there is an algorithm that generates an output sequence of
bits such that there is no polynomial-time algorithm A that will take
the previous L bits in the output sequence that's L bits long and
predict the (L+1)st bit with probability significantly greater than
1/2.  

The definition of predictability used is straight from Chapter 5,
Section 5.2 of the  "Handbook of Applied Cryptography" as explained in
the referenced post. The handbook is on-line at
http://cacr.math.uwaterloo.ca/hac/

A cryptographically secure pseudo-random bit generator (which is an
algorithm) produces output sequences that are unpredictable (per the
above definition of predictable.) Therefore an algorithm can produce
unpredictability.  See Chapter 5, Section 5.5 of the HAC.

> Randomness is NOT
> the same thing as unpredictability.  

Agreed, and nothing in my post indicated that randomness was the same
thing as unpredictability. Predictability with reference to the output
sequence of a PRBG has a well defined meaning. 

> 
> Unpredictable means that I cannot guess the next output. 

Yes, and that is part of the definition of the next-bit test for PRBGs. 

> The biggest challenge in cryptosystems today is coming up with a
> suitably large set of unpredictable events to seed the generator with

Agreed. This is an interesting engineering challenge.

> -- and to have a generator whose output is in itself difficult to
> predict (something which seems to be based on abuse of cryptographic
> components such as ciphers and cryptographic checksums lately).


In much of the cryptographic literature the seed for a CSPRBG is
described/defined as a random variable on a uniform distribution.  This
random seed is supplied to the CSPRBG and it proceeds to generate a
cryptographically secure pseudo-random bit output sequence that is
unpredictable, which means the next bit of the output sequence cannot be
predicted with significantly greater than 1/2 probability using a
polynomial-time algorithm that takes the all of the previous output bits
in the sequence. 

That's where I'm coming from in these posts.

Hope this helps,

John A. Malley
[EMAIL PROTECTED]

------------------------------

From: "Jakob Jonsson" <[EMAIL PROTECTED]>
Subject: Re: A challenge
Date: Mon, 11 Dec 2000 09:03:43 +0100

Hi,

For newsgroup readers: This text contains some (minor) spoilers, so please
beware.









I think this is a quite good exercise, at least if the students (or whatever
they are) have some cryptographic experience in advance (but don't expect
them to break it within an hour unless they are particularly clever). I have
tried it myself, and I succeeded in extracting the plaintext. Yet, I failed
to determine the general rule for decryption. Decrypting a single word by
checking all 26 possible rotations is easy (given that you know the rule),
but it is hard to see how the rotation numbers are generated.

By the way, there seem to be plenty of errors in the ciphertext:

"panumfhhna" -> "panumfhhnz"
"hhuqvxajf" -> "hhuqvxzjf"
"tvuuxxnxl" -> "tvuuxsnxl"
"uffc" -> "uftc"
"maaaak" -> "maalak" (?)
"libbr" -> "libkr"
"qbb" -> "qbp" (etc ;) )
"uaq" -> "uhq"
etc.

(Or I may have misinterpreted something...) Are those errors deliberate?
They make the cryptanalysis a bit harder, but they certainly give the cipher
a flavor of authenticity.

A suitable hint for students who get stuck: Tell them to guess the meaning
of obk and ive and ask them to find the correct way of decrypting these
words. They can check their hypotheses by decrypting obkrw and ivelq. (This
is the way I broke it.)

Finally (this is clearly a spoiler):

Where does the text come from? To me, it sounds like a Star
Wars/Tolkien-inspired poem. When I tried to decrypt "Ftfjlkai", the only
thing that was close to making sense was "Shinuawa". Is there some error in
"Ftfjlkai" or is Shinuawa the name of the kingdom in the poem?

Jakob

<[EMAIL PROTECTED]> skrev i meddelandet
news:90mtvl$el6$[EMAIL PROTECTED]...
> I'm developing a cypher for a game and would like the encoded message
> to be breakable without extensive use of a computer.  Since I know the
> cypher I am hardly a good judge of how difficult it is to crack.
> Therefore, I am putting out the challenge to any of you to break the
> encoded text that follows.
>
> And no, this has nothing to do with school.
>
> Thanks!
>
> E panumfhhna fa obk sngei oqbo hhuqvxajf md lnwlx.  N ekav hud ayhhm wm
> ghcgxcdb paqnh wjcah xn pa lipa.  O megu uffc uaq thrtpf.  Ylu ocmoka
> uk Ftfjlkai yaeawqj nh ome jrkk.  Tgpm sie wm kaswq pkdbf zlo libbr
> obkrw rrerfvdjfwpv, maaaak owq yii rl mxx, mgkk ive qbufaxl nd ivelq
> zvrhpj.
>
> L paa quqshdcy prmpadth njog ncfwjrfy ums xkfsn rliofend hmaollh kxgns
> dntplib.  M qbb ivep lvhs.  B mpeuq naju jwajtv wjsae abwwwfspeesl.  W
> itt eral lyclvx hludt lvigpfqjp.  P bmm cpyfk jvmgp; c eqhbk wjoo fvvnr
> bjqp.
>
> T hixlm yiqq wessydr ko d wkiojbbt zmvs doff rxrf xr dvsdpddjps, tn ubo
> fsb nde nojrxkdhpvfxa nd Ftfjlkai gsv gmgu lyhe wycsrme im cpy
> tufgefwxlhyr pf sudse.
>
> B hss najqv kwnhq tvuuxxnxl xxe...
>
>
> Pendergarth
>
>
> Sent via Deja.com http://www.deja.com/
> Before you buy.



------------------------------

From: "Michael Brown" <[EMAIL PROTECTED]>
Subject: 3D-Now any good?
Date: Mon, 11 Dec 2000 22:13:43 +1300

Hi there,

I was wondering if anyone had tried implementing RSA using 3D-Now? If so,
did it work well?

Thanks



------------------------------

From: "Michael Brown" <[EMAIL PROTECTED]>
Subject: Re: Revised cipher
Date: Mon, 11 Dec 2000 22:21:19 +1300

<Dons fireproof suit>
Why not just use Pascal? It's way more definitive with blocks :)
hehehee
== snip snip snip ==
== chop ==



------------------------------

From: "Michael Brown" <[EMAIL PROTECTED]>
Subject: Re: breaking rsa knowing the original text
Date: Mon, 11 Dec 2000 22:25:03 +1300

Bill Unruh <[EMAIL PROTECTED]> wrote in message
news:90rnvp$a4e$[EMAIL PROTECTED]...
> In <90p364$77i$[EMAIL PROTECTED]> [EMAIL PROTECTED] writes:
>
> >Hi, I'm trying to figure out if it is much simpler to
> >break RSA knowing the original text and the encrypted text.
> >I mean, if some knows the original and the encrypted data
> >how easy will be to get the key ??
>
> Since anyone in the world can generate as many encrypted-plaintext pairs
> as they want, all estimates of RSA strength assume that the attacker has
> an unlimited supply of such pairs. As far as is known the only want to
> break RSA is to factor the modulus. There may be some other way, but
> "nobody" knows it.
The way I'm working on should do it very quickly. ~ 4 hours for a 4096 bit
key (on a P3-550). Lots of RAM is required (about 150ish MB).



------------------------------

From: "Michael Brown" <[EMAIL PROTECTED]>
Subject: Re: Modular Arithmetic
Date: Mon, 11 Dec 2000 22:28:49 +1300

And (a + b) mod 2 = a + b - 2ab
Don't know about higher moduli

Paul Pires <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]...
>
> korejwa <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]...
> > Can anyone recommend a good source for learning about the MOD function?
> > I find myself figuring out its rules and properties by experiment
> > because there doesn't seem to be any 'complete' description of it
> > anywhere.
>
> Some things are so known
> and understood that you have a devil of a time finding
> definitions.
>
> Try "Remainder" as in:
>
> 10/7 =  1 remainder 3
>
> It just yeilds the remainder after the interger multiples have
> been divided out. If you are looking at code, watch for shortcuts.
> in C.
>
> x mod 2 (x%2) is equivalent to x&1 (Logical AND of x and 1)
> x mod 64 (x%64) ~ x&63.....etc. Real handy for lopping off those
> unwanted bits.
>
> Paul
>
>
> >
> >
> >
> >
>
>
>
>
> -----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
> http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
> -----==  Over 80,000 Newsgroups - 16 Different Servers! =-----



------------------------------

From: Benjamin Goldberg <[EMAIL PROTECTED]>
Subject: Re: [Question] Generation of random keys
Date: Mon, 11 Dec 2000 09:51:11 GMT

Simon Johnson wrote:
> 
> In article <8vbfut$e31$[EMAIL PROTECTED]>,
>   Alan Rouse <[EMAIL PROTECTED]> wrote:
> > The hard part is knowing how much entropy you have collected.
> > Perhaps you want to create a 160-bit key, so you need 160 bits of
> > entropy.
> > How can you be certain that you have that much?  Tough question.
> >
> > Yes you can be "very conservative" and collect 2x, or 10x, or 100x
> > as much entropy-containing data as you think you need.  But you
> > still don't KNOW how much you have.  Perhaps the entire sample is
> > completely determined by a relatively small number of controlling
> > factors.  This is a difficult problem that cannot be solved by
> > software.  It requires careful study on a case-by-case basis.  And
> > even then you won't KNOW.
> >
> > Sent via Deja.com http://www.deja.com/
> > Before you buy.
> 
> The real problem is not wether or not you can collect the maximum
> entropy. The problem is wether the cryptanalysis can exploit it if it
> isn't. If a coin has a bias of +-1 x 10^-36 and we generate 128-bits
> from it, for a key, the security of the system will be unnoticably
> degraded.

I assume that you mean 128 bits using the raw value of 128 flips.
If I were using a coin for entropy, I'd first flip and record 128 times,
then calculate the order 0 entropy per bit, then flip and record a few
more times until I was certain I had enough flips to contain 128 bits of
entropy, then take a secure hash of the result.  This is rather like the
approach used in Yarrow.

Why would I do this?  Because I don't know that the coin has as low a
bias as you say.  I have to measure it to trust the amount of bias.

> If your really parinoid, we can create a near perfect entropy
> collection, by flipping a load of coins in parrell then XORing the
> flips. Any bias in the resultant output will degrade expodentially as
> the number of coins increases.

There is no need to flip the coins in parallel.  One coin flipped twice
produces the same entropy as two coins flipped simultaneously --
assuming they have the same amount of bias, of course.  The only
possible advantage of simultaneity is speed.

> Or, you could just use the radioactive decays of Plutonium to generate
> your keys? In which case, you will get perfect entropy collection,
> provided you collect bits at high resolution.

Almost, but not quite.  Assuming that you measure the time difference
between two pairs of decay events, and use 0 for shorter, and 1 for
longer, you have to deal with the fact that your source is getting used
up, and on average, the second pair of decay events will have a longer
interval between them than the first pair, so there will be a bias
towards outputting a 1 bit.

You also have to deal with the fact that if two events happen close to
simultaneously, then the measured duration of the interval will be
anomalous.  How long your detector takes to recover from a decay event
before being able to measure the next one determines how much of an
error there will be.  Even worse, if your radioactive source is *very*
active, it might be constantly bombarding your detector, producing
either *no* measured events, or a continuous stream of measured events
occuring at a constant interval equal to the recovery time of the
device.

-- 
There are three methods for writing code in which no bug can be found:
1) Make the code so straightforward that there are obviously no bugs.
2) Make the code so complicated that there are no obvious bugs.
3) Insist that any apparent bugs were really intentional features.



------------------------------

From: Benjamin Goldberg <[EMAIL PROTECTED]>
Subject: Re: EDDF: the intended audience
Date: Mon, 11 Dec 2000 09:51:15 GMT

Roger Schlafly wrote:
> 
> denis bider wrote:
> > I think you will agree that, ignoring the human-readability
> > argument, having a text-based format only brings a bunch of problems
> > and not much else.
> 
> No.

One occasional disadvantage with some text based formats is that the
parser might be sensitive to the difference between LF or CRLF.  A
person who sees that a file is text will tend to automatically send it
as ascii, not binary, without thinking.  This causes problems.

If a person sees that a file is binary, then there is no question that
he'll send it as binary (unless he's stoopid).

-- 
There are three methods for writing code in which no bug can be found:
1) Make the code so straightforward that there are obviously no bugs.
2) Make the code so complicated that there are no obvious bugs.
3) Insist that any apparent bugs were really intentional features.



------------------------------

From: Benjamin Goldberg <[EMAIL PROTECTED]>
Subject: Re: Array shuffling
Date: Mon, 11 Dec 2000 09:51:18 GMT

Mok-Kong Shen wrote:
> 
> Benjamin Goldberg wrote:
> >
> [snip]
> > int int_shuffle_array(
> >     unsigned int * arr, unsigned int * buf, int len )
> > {
[snip]
> > }
> 
> Could you give an (tiny) example call to ease understanding?
> I am not sure that it is clear that the shuffled result
> is not 'correlated' with the original array in some way.
> Thanks.
> 
> M. K. Shen

I'm not certain what you mean.  I can give a tiny example, but I don't
see what it will show you.

EiC 95> #include radix_shuffle.txt
        (void)
EiC 96> unsigned int array[5]={1,2,3,4,5}, buffer[5];
        (void)
EiC 97> int_shuffle_array( array, buffer, 5 );
        13
EiC 98> printf("%d,%d,%d,%d,%d\n",\
EiC 99> array[0],array[1],array[2],array[3],array[4]);
4,2,5,1,3
        10

If you're curious, EiC is an interactive C interpreter, whose homepage
is at <URL=http://www.kd-dev.com/~eic/>.

The reason the history number was high was due to my using EiC as a
calculator, since it's much easier than "calc" to see if everything's
typed in right, and to parenthesize, repeat/edit prior equations, etc.
Especially for doing things like 3*log(3)/log(2), then 5*log(5)/log(2),
etc.  Perhaps Maple or Matlab might be better for that, but I don't have
them.

-- 
There are three methods for writing code in which no bug can be found:
1) Make the code so straightforward that there are obviously no bugs.
2) Make the code so complicated that there are no obvious bugs.
3) Insist that any apparent bugs were really intentional features.



------------------------------

From: Benjamin Goldberg <[EMAIL PROTECTED]>
Subject: Re: Array shuffling
Date: Mon, 11 Dec 2000 09:51:21 GMT

Scott Fluhrer wrote:
> 
> Benjamin Goldberg <[EMAIL PROTECTED]> wrote in message
> news:[EMAIL PROTECTED]...
> > Having recently looked at the sci.crypt FAQ, I noticed that the
> > array shuffling algorithm requires that you use a modran function.
> >
> >for( n = ARRLTH-1; n > 0 ; n-- ) swap(&arr[modran( n+1 )], &arr[n]);
> >
> > Considering that, with an arbitrary parameter, each call to modran
> > takes a nondeterministic amount of time to complete, and further is
> > generally not the most efficient use of random bits, I took it upon
> > myself to write my own array shuffler, which uses random bits in the
> > most efficient way possible.
>
> Actually, if you're looking to use the random bits as efficiently as
> possible, it's actually rather simple: use your random bit stream to
> select an integer between 0 and N!-1, and use that integer to select
> the permutation.  Using a maximally efficient modran function, this
> takes an expected log2(N!)+O(1) random bits.  Your algorithm uses
> strictly more than this algorithm for N>2.

Where could I find a description of how a maximally efficient modran
function works?  Most of the ones I've seen simply aren't.

> > I submit it here for your review and comment on.  Although this
> > algorithm also takes a nondeterministic amount of time to complete,
> > it should use N*log(N) bits on average, and the nondeterministic
> > part is where you can see it, in the do/while loop, not hidden in
> > random number function.  This shuffler should not be biased if
> > randbit() is not biased.
> 
> Comments:
> 
> - This algorithm looks like a randomized quicksort (minus the pivot
> element), while the classical algorithm can be viewed as a randomized
> selection sort (with optimizations).

This algorithm is actually a randomized two bucket radix sort.  Perhaps
it looks a bit like quicksort, because of how I use buf to make two
buckets (left bucket and right bucket), but it's not intended to
resemble quicksort.

> - This does appear to be unbiased, in the sense that it produces all
> permutations with probability 1/N!, if all randbit() outputs are
> unbiased and independent.  I think I see an outline of a proof -- have
> you proved it yourself?

I've proved to myself (by construction, doing a diagram with pen and
paper) that with N=3, the shuffle is unbiased, but I have not
constructed a rigorous proof.

> - Are you sure that your algorithm uses fewer expected random bits
> than the classical algorithm?  At N=3, the classical algorithm uses an
> expected 3 2/3 random bits, while your algorithm uses an expected 5
> bits.  For N=4, the expected number of random bits are 5 2/3 for
> classical, and 8 2/7 for yours.

I am pretty certain that my algorithm uses an average of N*log(N) bits,
simply because that is the number of tests radix sort needs to sort an
array size N using 2 buckets.  It should be trivial to convert a proof
of the time complexity of radix sort into a proof of the complexity of
this algorithm.

For N=3, this means 4.7549 bits; for N=4, this means 8 bits.  What do
you mean "3 2/3", "5 2/3" and "8 2/7"?  Three-and-a-third,
five-and-two-thirds, eight-and-two-sevenths?
Assuming that's what you mean, how do you calculate that the classical
algorithm (the one with the swapping) uses 3.6667 bits to sort 3 items?

The "optimal" algorithm, one which simply selects one of N! items,
should take 2.5850 bits on average for N=3.

> In addition, as I referenced earlier, a maximally efficient modran
> function (which can be efficiently computed) can make an unbiased
> selection of N items using an expected log2(N)+O(1) random bits --
> have you looked at the classical algorithm using that?

Not yet... where do I find it?  And you seem to have left out a "!"
after N (in log2(N)+O(1)).

> - Have you analyzed the expected number of random bits your algorithm
> uses?
> You may want to look at a formal analysis of quicksort (e.g. in Knuth)
> -- the lack of a pivot element will change the analysis somewhat, but
> I'd expect it to remain similar.

This is based on radix sort, not quicksort.  However, I have seen an
analysis of radix sort... but it was a long while ago.

> - One distinction is that your algorithm uses a destination buffer,
> while the classical algorithm is in-place.  For some applications,
> this may be importent.

It's a time space tradeoff.  If I did swaps, it would not need the
buffer, and thus use less space, but a swap operation is three
assignments, whereas using the buffer is one assignment per item.

> - Have you heard of the obscure technique known as "indenting"?  Makes
> your code much easier to read.

If your usenet viewer is stupid enough to strip leading whitespace, that
is NOT my fault, nor my problem.

[snip mangled code]

My code *does* have whitespace in it.  Why your viewer stripped it, I
don't know.

> Your rand() may be poor in the low bits.  It's a common failing with
> rand() implementations.

Curiously, testing seems to show that if 2**17 tests are made, the low
bit of rand() is 0 exactly 2**16 times.  If 2**18 tests are made, the
second lowest bit is 0 exactly 2**17 times.  This progression holds for
the lowest 4 bits, but I don't have the patience to test if this holds
for the rest of the bits.

Over it's entire period, rand apparently doesn't have bias, and yet, the
shuffling used more bits than it should have.

Anyone want to postulate why?  I certainly don't know.

-- 
There are three methods for writing code in which no bug can be found:
1) Make the code so straightforward that there are obviously no bugs.
2) Make the code so complicated that there are no obvious bugs.
3) Insist that any apparent bugs were really intentional features.


------------------------------


** FOR YOUR REFERENCE **

The service address, to which questions about the list itself and requests
to be added to or deleted from it should be directed, is:

    Internet: [EMAIL PROTECTED]

You can send mail to the entire list by posting to sci.crypt.

End of Cryptography-Digest Digest
******************************
Cryptography-Digest Digest #305

Reply via email to