Cryptography-Digest Digest #601, Volume #10 Sun, 21 Nov 99 04:13:05 EST
Contents:
Re: Simpson's Paradox and Quantum Entanglement ("rosi")
Re: Group English 1-1 all file compressor (Tim Tyler)
Re: Group English 1-1 all file compressor (Tim Tyler)
Re: Distribution of intelligence in the crypto field (Aidan Skinner)
Re: AES cyphers leak information like sieves (John Savard)
Crypto Stats for Other Languages ("@li")
Blowfish ("Eric Anderson")
Re: Nova program on cryptanalysis -- also cipher contest ("@li")
Re: Letter Frequency in English Texts vs. Name Lists ("Arnaud Guillon")
Re: The DVD Hack: What Next? ("Arnaud Guillon")
--- sci.crypt charter: read before you post (weekly notice) (D. J. Bernstein)
----------------------------------------------------------------------------
From: "rosi" <[EMAIL PROTECTED]>
Crossposted-To: comp.ai.fuzzy,sci.physics,sci.math
Subject: Re: Simpson's Paradox and Quantum Entanglement
Date: Sat, 20 Nov 1999 19:54:33 -0500
How beautiful!
I doubt if anyone doubts that after the first question by Hilbert
was answered there would be little doubt left. However, if you
take an eraser and go through the textbooks TODAY and find
anything that you would not particluarly hesitate to rub out, ... :)
Thanks, John, for the wonderful piece!
--- (My Signature)
John Forkosh wrote in message <816bkj$lie$[EMAIL PROTECTED]>...
>[EMAIL PROTECTED] wrote:
>: Simpson's Paradox:
>: http://curriculum.qed.qld.gov.au/kla/eda/sim_par.htm
>: Simpson's Paradox is a statistical artifact ...<snip>
>
>There's a textbook treatment in "Quantum Probability",
>Stanley P. Gudder, Academic Press 1988, ISBN 0-12-305340-4,
>pages 102-106, which concludes it's not a problem.
>John ([EMAIL PROTECTED])
>
>P.S. To see the "classical problem" consider, e.g., a college
>with Law and Business schools, interested in its admissions
>of Men vs. Women. It tabulates
> #accepted/#applied=%accepted
>for Men and Women at each school, finding %accepted is greater
>for Women at both schools individually, but greater for Men
>when the schools are combined. How's this possible? Consider...
> Law School Business Combined
> ----------------------------------------------
> Men 18/120=15% 180/240=75% 198/360=55%
> ----------------------------------------------
> Women 24/120=20% 64/80 =80% 88/200=44%
>The problem is that the "combining rule" is a/b,c/d --> (a+c)/(b+d)
>which isn't a typical arithmetic operation, though it does model
>the "word question" posed by the college.
> Arithmetically, we have 18/120 < 24/120 and 180/240 < 64/80,
>and we're intuitively concluding (18+180)/(120+240) < (24+64)/(120+80).
>Substitute symbols, and a little algebra shows this isn't generally true.
>(Note: It is true if the denominators at each school are equal,
>e.g., multiply the Business Women by 3/3.)
> Thus, ultimately, the "word question" isn't really well-posed
>in terms of percentages, because division isn't linear in the sense
>assumed by the problem.
------------------------------
Crossposted-To: comp.compression
From: Tim Tyler <[EMAIL PROTECTED]>
Subject: Re: Group English 1-1 all file compressor
Reply-To: [EMAIL PROTECTED]
Date: Sun, 21 Nov 1999 00:38:17 GMT
[crossposted due to apparent relevance]
William Rowden <[EMAIL PROTECTED]> wrote:
: I may have missed discussion on another thread that would make a
: difference in my understanding of your proposal. Nevertheless, here are
: my thoughts:
: In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] wrote:
:> However, if you allow multiple passes, the problem vanishes ;-)
: Doesn't the problem also vanish if one of the encoding rules is to
: encode the longest possible substring? You can implement this as a
: finite-state machine.
No. As far as I am aware, the constraints are as I listed them on
the http://www.alife.co.uk/securecompress/dictionary/ page.
To quote:
=====================
* No string in the tables should contain another such string as a
substring;
* No leading symbols in any string should exactly match the trailing
symbols in a different string.
Between them these conditions are both necessary and sufficient for
lossless, "one-on-one" compression to occur.
=====================
Simply choosing the longest possible substring is not a sufficient
condition for one-on-one compression to occur. You need the other
stuff I mentioned - which reduces the longest-first condition to
complete irrelevance ;-/
If anyone can reduce the strength or scope of these constraints, I
have yet to hear about the matter.
:> It doesn't seem to matter much which order you compress in for this
:> example: both schemes reduce the original 16 letters to 7.
: If you do this using the rule I suggest above, at once rather than in
: multiple passes, your examples aren't two alternative schemes; they are
: two different ways of expressing the same approach.
I am not convinced that - in general - multiple passes may be avoided,
while retaining the same functionality.
The "take the longest string" method simply doesn't work on its own -
you need my constraints - see my web page of the subject for the details
of the reasons why.
:> Finding dictionaries that compress as well as is possible (with this
:> multiple pass scheme) will not be easy.
: If you want a compressor (that is not adaptive?)
That's right. My scheme uses a static dictionary. David's scheme is
adaptive.
: [...] for a source model that is some type of English, then there is a
: wealth of frequency information available.
Almost all of which appears to me to be practically useless. Programs
that generate frequency information from sample text may be able to
be hacked into line to encompass my constraints. However I do not
as yet have a clear strategy for dealing with multiple dictionaries.
I think, a longest-first method, will produce good results in this area.
Applying dictionaries that have all entries the same length avoids
the "no substring condition completely. If most entries still end in
spaces, there's a fairly free range of possible words allowed.
At small word sizes, not all symbols should necessarily end in spaces any
more - and more than one dictionary could be used.
As you can see, my plans in this area are still vague and fuzzy.
:> Can anyone present an argument about whether compressing short or
:> long substrings first is the better strategy?
: One argument is algorithm availability. An algorithm for finding the
: longest substrings using a finite-state machine is well-known and
: implemented in regular expression searching.
Hmm. The algorithms aren't /that/ hard for short strings. An
exhaustive search using a frequency table for the results is even
possible. I'm not sure ease-of getting long strings makes much difference
to the long-first or short-first question.
As you may have noticed, I'm currently thinking of using long-first
techniques, though.
[snip David's comment on the quantity of work]
: It doesn't look like a lot of work to write a test compressor. My
: Unix/Linux bias is about to show: If you know 'awk' (which uses regular
: expressions), you could write a short script. Another alternative is
: 'lex', which has the benefit of providing you with C source code.
Yes, it's not /that/ much work. My other projects mean that I still have
yet to start on it, though ;-/
Alas - as I have mentioned - I see writing the compression program itself
as the smaller component of the problem.
--
__________
|im |yler The Mandala Centre http://www.mandala.co.uk/ [EMAIL PROTECTED]
I'm leaving my body to science fiction.
------------------------------
From: Tim Tyler <[EMAIL PROTECTED]>
Subject: Re: Group English 1-1 all file compressor
Reply-To: [EMAIL PROTECTED]
Date: Sun, 21 Nov 1999 00:48:25 GMT
William Rowden <[EMAIL PROTECTED]> wrote:
: In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] wrote:
:> [snip problem] if you allow multiple passes, the problem vanishes ;-)
:>
:> Dictionary 1: "this " <--> "#"
:> Dictionary 2: "his" <--> "^"
:> Dictionary 3: "is " <--> "@"
:>
:> ...produces:
:>
:> "this is his head" -1-> "#is his head" -2-> "#is ^head" -3-> "#@^head"
: IIRC, David's concept of a "1-1" compressor means that every file is a
: valid compressed file.
Yes, that's one implication of it.
: The idea is to make it impossible to decompress, recompress, and compare
: to the original file as a way of eliminating invalid plaintexts. Right?
Yes, that's more precisely his aim.
: A side benefit would be efficient use of the compression function's
: codomain.
Yes indeed.
: The example above does not have this property: "t^" decompresses to
: "this" but recompresses to "#".
No. If you apply these multiple dictionaries in *increasing sequence*
to compress, you need to apply them in decreasing sequence to decompress.
Decompress (in order):
Dictionary 1: "this " <--> "#"
Dictionary 2: "his" <--> "^"
Dictionary 3: "is " <--> "@"
["t^" indeed decompresses to "this"...]
Compress (in order):
Dictionary 3: "is " <--> "@"
Dictionary 2: "his" <--> "^"
Dictionary 1: "this " <--> "#"
[...but "this" recompresses to "t^" again because the *whole text* is
treated with dictionary 2 before dictionary 1 is applied].
Each dictionary /happens/ to have only one entry in this example.
This is not necessary - more than one entry is quite possible -
provided my original two constraints are adhered to.
--
__________
|im |yler The Mandala Centre http://www.mandala.co.uk/ [EMAIL PROTECTED]
Infants don't enjoy infancy like adults enjoy adultery.
------------------------------
From: [EMAIL PROTECTED] (Aidan Skinner)
Subject: Re: Distribution of intelligence in the crypto field
Date: 20 Nov 1999 22:31:33 GMT
Reply-To: [EMAIL PROTECTED]
On Sat, 20 Nov 1999 20:18:06 GMT, Jim Dunnett
<[EMAIL PROTECTED]> wrote:
>On the other hand, why are they so trying so hard to suppress strong
>crypto, key-escrow etc. if they can hack PGP, for example?
Rule number one of intelligence: never let your enemy know your
capabilitys.
Not to say that they *can* break anything, I don't know. But if they
did, I don't think that you'd see a change in policy.
- Aidan (who relies on not being of interest to foreign governments to
keep him free from surveilence and living a dull and uninteresting
life to keep him free from arrest)
--
"I say we just bury him and eat dessert"
http://www.skinner.demon.co.uk/aidan/
OpenPGP Key Fingerprint: 9858 33E6 C755 7D34 B5C5 316D 9274 1343 FBE6 99D9
------------------------------
From: [EMAIL PROTECTED] (John Savard)
Subject: Re: AES cyphers leak information like sieves
Date: Sun, 21 Nov 1999 01:43:37 GMT
On Sun, 21 Nov 1999 00:00:54 +0000, Toby Kelsey
<[EMAIL PROTECTED]> wrote:
>Check <[EMAIL PROTECTED]> posted on 02/02/1999 for
>the original exposition. Ironically I was correcting dscott at the time.
I checked at DejaNews, and found a post of that date in the thread
"What is Left to Invent". In there, you noted that if a message is
sufficiently well compressed that the probability of a sensible
message is greater than 1/2^N, for an N-bit key, ambiguity is created.
What David Scott is discussing is a method of compression where the
probability is 1 that a random string of bits will be valid compressor
output, although the probability that *other*, more sophisticated,
tests will indicate that the source message made sense depends on the
quality of the compression.
I don't know if that was original with him; I would have thought this
was something thought of long ago, but it isn't an idea that comes up
often.
------------------------------
From: "@li" <[EMAIL PROTECTED]>
Subject: Crypto Stats for Other Languages
Date: Sat, 20 Nov 1999 21:16:27 -0800
Hi,
I was wondering where I might look to find statistical data about languages
other than English. It seems that only english (or at least latin)
character sets are analyzed. How about languages such as hebrew and arabic?
------------------------------
From: "Eric Anderson" <[EMAIL PROTECTED]>
Subject: Blowfish
Date: Sat, 20 Nov 1999 14:58:14 -0800
Could someone please send me preferably via e-mail the results of the
blowfish algo using a 32bit key of all zeros and having both the left and
right sides of the plaintext type to be straight zeros also?
Thank you
Eric Anderson
[EMAIL PROTECTED]
------------------------------
From: "@li" <[EMAIL PROTECTED]>
Subject: Re: Nova program on cryptanalysis -- also cipher contest
Date: Sat, 20 Nov 1999 21:36:59 -0800
"Troed" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]...
> William Rowden <[EMAIL PROTECTED]> wrote:
>
> >only narrowed the locations to a score or so each. Attempting to
> >reconstruct the Playfair square for a few of these locations by hand
> >motivated the writing of a computer script to search the permutations.
> >
> >The computer's first pass with the script produced four partial Playfair
> >squares for testing--a number of squares this human was willing to try.
>
> I'm also writing a "solve Playfair" program at the moment ... not the
> same contest. So far I've managed to narrow down lots of rules on what
> a Playfair square can and cannot, but I haven't really been able to
> get it all automated.
>
> What I have in the program so far is extraction of the bigrams, their
> count, and then it does some basic plaintexts exchanging most common
> cipher bigram for most common bigram in selected language etc.
>
> If you have any brief examples of how to get "same row or char above"
> rules into C/C++ I'd be happy to have a look ;)
You can use a coordinate system using 2-dimentional arrays (although I am
not sure how to define the size of 2-dim arrays at run-time). For example:
1 2 3 4
1 a b c d
2 e f g h
3 j k l m
then all you have to do is check the x's and the y's for a match.
so you can define same row/column to be when x1==x2 or y1==y2
------------------------------
From: "Arnaud Guillon" <[EMAIL PROTECTED]>
Subject: Re: Letter Frequency in English Texts vs. Name Lists
Date: Sun, 21 Nov 1999 14:07:40 +0800
Did one yesterday for frequency analysis for the Code Book cypher
challenges.
It takes any text file that you throw at it and generates an output text
file that tells you what are the occurences and frequencies of individual
letters within the text.
Written in VB6, but since I designed it for small text files I am limited to
64k texts...
To go for bigger text files you would have to start manipulating Win95/98
memory addressing from within the program, which I could not be bothered to
do (lazy...).
In the same vein, I just created a small program which encrypts/decrypts
text files for Vigenere cyphers, and am now working on another program which
automatically generates homophonic cyphers from an input text.
Let me know if you need the code, it s rather straightforward stuff I m
afraid (I m no professional programmer).
<[EMAIL PROTECTED]> wrote in message
news:81203k$h20$[EMAIL PROTECTED]...
> I am doing research on large sets of name lists.
>
> I have information on letter frequencies in
> common english texts, but I am curious as to how
> this compares to the letter frequencies in a
> large name set.
>
> Does anyone have a program that will calculate
> letter frequency patterns when given a text data
> file that they can share with me? Or even common
> letter frequencies for name sets?
>
> If you can help please e-mail me at
> [EMAIL PROTECTED]
>
>
> Sent via Deja.com http://www.deja.com/
> Before you buy.
------------------------------
From: "Arnaud Guillon" <[EMAIL PROTECTED]>
Subject: Re: The DVD Hack: What Next?
Date: Sun, 21 Nov 1999 14:22:54 +0800
As an addition to your post, there is another program named DODSRIP which
does the same stuff, under DOS prompt though.
lots of these utils could be found at www.dvdutils.com but I think they
recently had to remove them after intervention from the big multimedia
companies.
This sites and other similar sites include interesting instructions as to
how to encode DVD to MPEG1 format. I have tried it using a package called
DVD2MPEG and it does indeed work very well.
Cheers
Mark Keiper <[EMAIL PROTECTED]> wrote in message
news:80s21s$50a$[EMAIL PROTECTED]...
> Ken Lee <[EMAIL PROTECTED]> wrote in message
> news:[EMAIL PROTECTED]...
> > They said the instructions of coping DVD was posted on the Internet,
> anyone
> > knows what the url is?
> >
>
>
> I had just finished reading about it when I came across the DVD ripper in
> (of all places!) www.download.com
>
> Was looking for a totally unrelated PD application when this sprouted up
(I
> apparently was the first to DL it according to download.com stats).
>
> file name: decss121b.zip
>
> One thing though- the ripping process requires buttloads of disk space (5
to
> 10 gigs) for average sized movies according to the article I read.
>
>
> Mark Keiper
> [EMAIL PROTECTED]
>
>
>
------------------------------
From: [EMAIL PROTECTED] (D. J. Bernstein)
Crossposted-To: talk.politics.crypto
Subject: --- sci.crypt charter: read before you post (weekly notice)
Date: 21 Nov 1999 06:00:35 GMT
sci.crypt Different methods of data en/decryption.
sci.crypt.research Cryptography, cryptanalysis, and related issues.
talk.politics.crypto The relation between cryptography and government.
The Cryptography FAQ is posted to sci.crypt and talk.politics.crypto
every three weeks. You should read it before posting to either group.
A common myth is that sci.crypt is USENET's catch-all crypto newsgroup.
It is not. It is reserved for discussion of the _science_ of cryptology,
including cryptography, cryptanalysis, and related topics such as
one-way hash functions.
Use talk.politics.crypto for the _politics_ of cryptography, including
Clipper, Digital Telephony, NSA, RSADSI, the distribution of RC4, and
export controls.
What if you want to post an article which is neither pure science nor
pure politics? Go for talk.politics.crypto. Political discussions are
naturally free-ranging, and can easily include scientific articles. But
sci.crypt is much more limited: it has no room for politics.
It's appropriate to post (or at least cross-post) Clipper discussions to
alt.privacy.clipper, which should become talk.politics.crypto.clipper at
some point.
There are now several PGP newsgroups. Try comp.security.pgp.resources if
you want to find PGP, c.s.pgp.tech if you want to set it up and use it,
and c.s.pgp.discuss for other PGP-related questions.
Questions about microfilm and smuggling and other non-cryptographic
``spy stuff'' don't belong in sci.crypt. Try alt.security.
Other relevant newsgroups: misc.legal.computing, comp.org.eff.talk,
comp.org.cpsr.talk, alt.politics.org.nsa, comp.patents, sci.math,
comp.compression, comp.security.misc.
Here's the sci.crypt.research charter: ``The discussion of cryptography,
cryptanalysis, and related issues, in a more civilised environment than
is currently provided by sci.crypt.'' If you want to submit something to
the moderators, try [EMAIL PROTECTED]
---Dan
------------------------------
** FOR YOUR REFERENCE **
The service address, to which questions about the list itself and requests
to be added to or deleted from it should be directed, is:
Internet: [EMAIL PROTECTED]
You can send mail to the entire list (and sci.crypt) via:
Internet: [EMAIL PROTECTED]
End of Cryptography-Digest Digest
******************************