Cryptography-Digest Digest #836, Volume #11      Mon, 22 May 00 13:13:01 EDT

Contents:
  Re: Blowfish and Weak Keys ("Kasper Pedersen")
  Practical Blowfish Usage C++ Implementation ([EMAIL PROTECTED])
  Re: NSA hardware evaluation of AES finalists (Paul Koning)
  Re: pentium timings (Jerry Coffin)
  Re: pentium timings (Jerry Coffin)
  Re: pentium timings (Jerry Coffin)
  Re: pentium timings (tomstd)
  Re: Blowfish Claims ... something not right (tomstd)
  Re: Matrix reduction (Chris Card)
  Re: Practical Blowfish Usage C++ Implementation (Mark Wooding)
  Re: pentium timings (Jerry Coffin)
  Re: Matrix reduction (Chris Card)
  Re: More on Pi and randomness (Mark Wooding)
  Re: AEES-Cascade (Mark Wooding)

----------------------------------------------------------------------------

From: "Kasper Pedersen" <[EMAIL PROTECTED]>
Subject: Re: Blowfish and Weak Keys
Date: Mon, 22 May 2000 15:20:37 GMT


<[EMAIL PROTECTED]> wrote in message
news:8g0pb9$vmi$[EMAIL PROTECTED]...
> I have extended the attack on weak keys in Blowfish.  You are correct
> there is no way to exploit weak keys in general.  If a key is known to
> be weak however, then a faster than exhaustive search can be mounted.
> The attack essentially stops each guess after running the P generation
> for the schedule.  Since 512 more encryptions are required to set up the
> s-boxes, the attack is much faster than exhaustive search. On the down
> side, the best attack on a weak key is 128 faster than exhaustive
> search.  With a 128 bit key, 2^120 full guesses will be required, pretty
> stiff.
>

If I understand correctly, you will still be trying every input key? Then
it's similar to when you use 13/14/15-round DES in a DES keyserch; you just
decide early wether you want to complete the test of a specific key.

If so, he should NOT filter out weak keys; It would make breaking his
implementation 0.5% faster (isn't it 1 in about 200-something?) than brute
force.

/Kasper



------------------------------

From: [EMAIL PROTECTED]
Subject: Practical Blowfish Usage C++ Implementation
Date: Mon, 22 May 2000 15:42:57 GMT

I am attempting to use a blowfish algorithm to decrypt a few files on
my hard disk, that's all I need it for.  I have found a basic c++
implementation written by Jim Conger in 1996. This file,
"bfsh-con.zip" may be downloaded from counterpane's site and it is a
public implementation.

I need need some help properly utilizing this implementation as I am
not understanding how to use it in a practical manner. For instance,
there is public method in the blowfish class called GetOutputLength()
and I'm not sure why it is in included.

More alarming,  the ciphertext produced by the Encode() method,
repeats and  does not appear random to me. I think I may be using it
incorrectly.  I have written a small program to demonstrate.  I used a
16 byte key, hence the pass phrase is the same as the key. I did not
create a new key from the pass phrase using some one way hash method,
though I intend to do this later.

Here's the output:

Plaintext:
3132333435363738313233343536373831323334353637383132333435363738
3132333435363738313233343536373831323334353637383132333435363738

Ciphertext:
df8bb5d6be30cfe3df8bb5d6be30cfe3df8bb5d6be30cfe3df8bb5d6be30cfe3
df8bb5d6be30cfe3df8bb5d6be30cfe3df8bb5d6be30cfe3df8bb5d6be30cfe3

Back to plaintext:
3132333435363738313233343536373831323334353637383132333435363738
3132333435363738313233343536373831323334353637383132333435363738

#include "blowfish.h"  // Jim Conger's C++ implementation
typedef unsigned char uint;

int main()
{
   // password is 16 and plaintext is 64 bytes long
   char * password = "passwordabcdefgh";   
   const char * plaintext = "123456781234567812345678"
                                        "123456781234567812345678"
                                        "1234567812345678";
   class CBlowFish bf;
   bf.Initialize((uint *)password, strlen (password));

   char buffer[128];
   strcpy(buffer, plaintext);
   show_text_in_hex("Plaintext:", buffer, 64);

   // in buffer is the same as out buffer
   bf.Encode((uint *)buffer, (uint *)buffer, 64);
   show_text_in_hex("Ciphertext:", buffer, 64);
   
   bf.Decode((uint *)buffer, (uint *)buffer, 64);
   show_text_in_hex("Back to plaintext:", buffer, 64);

   return 0;
}

void show_text_in_hex(const char* title, const char* pbuf, int len)
{
   printf("%s\n", title);
   const unsigned char * up = (uint *) pbuf;
   for (int i = 0; i < len; i++) {
      printf("%02x", *up++);
      if (i == 31) printf("\n");
   }
   printf("\n\n");
}

Thanks for any help
Mike

------------------------------

From: Paul Koning <[EMAIL PROTECTED]>
Subject: Re: NSA hardware evaluation of AES finalists
Date: Mon, 22 May 2000 11:08:19 -0400

Scott Fluhrer wrote:
> 
> Paul Koning <[EMAIL PROTECTED]> wrote in message
> news:[EMAIL PROTECTED]....
> > Of course you can do parallel processing on several queued packets
> > of a single stream (provided it's stateless, e.g., IPsec).  That helps
> > a bit...
> 
> Also, I would assume that if you did have a stream that did 300 MB/s, that
> you would chose a mode that makes parallel processing on the same packet
> easy, such as counter mode or an aggressively interleaved mode.  This helps
> somewhat more...

I'd like that.  But that's not the reality of current protocol
standards.  IPsec uses CBC only, no other modes, in particular not
counter or interleaved modes.

Note that 300 Mb/s single stream 3DES CBC is currently available
(off the shelf, single chip) technology, or very nearly so (i.e., 
chips coming out within the next few months).

        paul

------------------------------

From: Jerry Coffin <[EMAIL PROTECTED]>
Subject: Re: pentium timings
Date: Mon, 22 May 2000 10:05:02 -0600

In article <[EMAIL PROTECTED]>, 
[EMAIL PROTECTED] says...

> It is unequivocally incorrect to call RDTSC and expect it
> to return anything resembling what you may expect unless you
> serialize the processor first. The easiest way to do this is to
> execute the CPUID instruction before every instance of RDTSC.

As I said, there's more to this than meets the eye -- even just 
calling CPUID isn't sufficient: the problem is that Intel documents 
CPUID as taking a different length of time to execute the first few 
times it gets used after booting up (exactly why I don't know, but 
that's what Intel says).

Therefore, to make things work correctly, you need to use CPUID at 
least three times, then the first RDTSC, the code to be timed, then 
CPUID again, and finally the last RDTSC.

That deals with the serializing issues.  Then you have to deal with 
code alignment: these processors can be quite sensitive to code 
alignment as well; to get consistent results, you need to use at 
least paragraph alignment.

-- 
    Later,
    Jerry.
 
The universe is a figment of its own imagination.

------------------------------

From: Jerry Coffin <[EMAIL PROTECTED]>
Subject: Re: pentium timings
Date: Mon, 22 May 2000 10:05:04 -0600

In article <[EMAIL PROTECTED]>, 
[EMAIL PROTECTED] says...

[ ... ] 

> I will go ahead and disagree with you now.  When timing my RC5
> code I always get the same clocks per block output, no matter
> what is running.

You can disagree all you want, but it only shows that you don't know 
what you're talking about.

The problem isn't that you'll get inconsistent results with the same 
code.  The problem is that if you change the code, the indications 
given by your timer will NOT necessarily mean anything.  It may say 
the new code is faster, when in reality it's exactly the same speed 
as the old code, or perhaps even slower than the old code.  Likewise, 
it could say the new code is slower when in fact it's the same speed 
or faster.
 
> What you have todo is call 'timer(void *)' more then once and
> take the average to get a good idea of what's going on.

Oh ye of little knowledge of statistics!  An average will NOT fix the 
sort of fundamental problems you've got in your code.  It's true that 
with properly functioning timing, you still need to run code more 
than once to time it properly.  This is NOT, however, to make up for 
errors in the timing.  First of all, you want to ensure that the code 
is in the cache before you time it (unless you're specifically timing 
something related to cache usage).  Second, you _may_ get interrupts 
during your timing.  If an interrupt occurs, however, you will get a 
bimodal distribution (assuming everything else is working correctly).

In this case, you should NOT (most _definitely NOT_) average the 
times: the time containing interrupt servicing is completely spurious 
and unrelated.  It should be completely _ignored_ in the final 
output.

Using an average assumes that errors in either direction are equally 
likely.  With properly functioning timing code, errors on the low 
side absolutely can NOT happen.  Therefore, in this case you want to 
use the minimum of the times you get, NOT the average.
 
> Even still if the rdtsc is part of the start/ending of the
> tested code, that's why it's called 512 times.  So the majority
> of the used cycles is in the called code and not the timer code.

This isn't true, and even if it was, it would still be irrelevant.  
Yes, making 512 calls to the code being timed can reduce the 
magnitude of the error to some degree, even with this (ridiculous) 
number of calls, the error can still be just under 8%.

To summarize: you've written code that's fundamentally wrong, and 
then taken a brute-force approach to minimizing the magnitude of the 
errors your produce.

> This code is for estimating and profiling, not for exact
> measurements.

You're completely ignoring the fact that it's just plain _wrong_.  To 
know whether it's safe to use it, somebody basically has to know a 
LOT more about things than you do, and would be able to write FAR 
better code themselves with considerably less effort than it would 
take to decide whether they're willing to ignore the errors from your 
code.

In short, your code is worse than useless: for the few people who 
could theoretically use it, it's more trouble than it's worth.  For 
everybody else, it can and will produce misleading and incorrect 
results, and worse yet, produce them with enough consistency that 
most people are likely to actually believe its results.

Publishing code like this (even for free) is grossly irresponsible.

-- 
    Later,
    Jerry.
 
The universe is a figment of its own imagination.

------------------------------

From: Jerry Coffin <[EMAIL PROTECTED]>
Subject: Re: pentium timings
Date: Mon, 22 May 2000 10:12:48 -0600

In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] 
says...

[ ... ] 

> The problem is that OS switches between tasks and does some
> management work thereby. You don't know how much switching
> time has been put onto your account.

That's a long ways from being "The problem" -- it's merely one of the 
_many_ problems. 

-- 
    Later,
    Jerry.
 
The universe is a figment of its own imagination.

------------------------------

Subject: Re: pentium timings
From: tomstd <[EMAIL PROTECTED]>
Date: Mon, 22 May 2000 03:44:59 -0700

In article <[EMAIL PROTECTED]>,
lordcow77 <[EMAIL PROTECTED]> wrote:
>In article <[EMAIL PROTECTED]>, tomstd
><[EMAIL PROTECTED]> wrote:
>>I will go ahead and disagree with you now.  When timing my RC5
>>code I always get the same clocks per block output, no matter
>>what is running.
>
>You don't seem to understand why your implementation of a cycle-
>level timing routine is inherently flawed and cannot be relied
>on to give correct results, except by accident. It's much like
>declaring main to return void in C: your compiler may choose, at
>its infinite discretion, to emit code that does what you want,
>or alternatively, turn your computer into a toaster. RDTSC is
>not a serializing instruction; the CPU may choose to execute it
>at any point while it is in the instruction reorder buffer
>waiting for dispatchment to its corresponding functional unit.
>You cannot rely on the CPU to wait until a specific instruction
>follows in lexigraphical order (ie. in the neat dissassembly
>that you may see) to actually execute an instruction. Moreover,
>the RDTSC may run in parallel with other instructions, may be
>stalled in the pipeline, may run into a pipeline bubble because
>of a register contention issue on an entirely seperate
>instruction, may be flushed from the pipeline becuase of data
>dependency problems, and its result may not even be committed to
>programmer visible register space for an undefined period of
>time UNLESS there is an explicit dependency, upon which the CPU
>will respect apparent sequential execution of the instruction
>stream. In other words, if it works at all, you are exceeding
>lucky. It is unequivocally incorrect to call RDTSC and expect it
>to return anything resembling what you may expect unless you
>serialize the processor first. The easiest way to do this is to
>execute the CPUID instruction before every instance of RDTSC.
>
>* Sent from RemarQ http://www.remarq.com The Internet's
Discussion Network *
>The fastest and easiest way to search and participate in
Usenet - Free!
>
>

In article <[EMAIL PROTECTED]>,
lordcow77 <[EMAIL PROTECTED]> wrote:
>In article <[EMAIL PROTECTED]>, tomstd
><[EMAIL PROTECTED]> wrote:
>>I will go ahead and disagree with you now.  When timing my RC5
>>code I always get the same clocks per block output, no matter
>>what is running.
>
>You don't seem to understand why your implementation of a cycle-
>level timing routine is inherently flawed and cannot be relied
>on to give correct results, except by accident. It's much like
>declaring main to return void in C: your compiler may choose, at
>its infinite discretion, to emit code that does what you want,
>or alternatively, turn your computer into a toaster. RDTSC is
>not a serializing instruction; the CPU may choose to execute it
>at any point while it is in the instruction reorder buffer
>waiting for dispatchment to its corresponding functional unit.
>You cannot rely on the CPU to wait until a specific instruction
>follows in lexigraphical order (ie. in the neat dissassembly
>that you may see) to actually execute an instruction. Moreover,
>the RDTSC may run in parallel with other instructions, may be
>stalled in the pipeline, may run into a pipeline bubble because
>of a register contention issue on an entirely seperate
>instruction, may be flushed from the pipeline becuase of data
>dependency problems, and its result may not even be committed to
>programmer visible register space for an undefined period of
>time UNLESS there is an explicit dependency, upon which the CPU
>will respect apparent sequential execution of the instruction
>stream. In other words, if it works at all, you are exceeding
>lucky. It is unequivocally incorrect to call RDTSC and expect it
>to return anything resembling what you may expect unless you
>serialize the processor first. The easiest way to do this is to
>execute the CPUID instruction before every instance of RDTSC.

If you look at my code I have

rdtsc
push eax
push edx
times 512 call [temp]
rdtsc
pop ebx
pop ecx
sub eax,ebx ; cpu must finish rdtsc here
sbb edx,ecx

Well if the cpu doesn't do the rdtsc, won't it store the wrong
values?  I can see my routines being off by a couple cycles
because of the pop ebx/ecx, but other then that, should be
close.

The whole point of my code is just to get a vague idea of how
well the code is performing.  With my RC5 code, if I put a
couple 'nops' in the outside of the loop, I can tell the cycle
count goes up by one or two using these exact same routines.

Tom


* Sent from RemarQ http://www.remarq.com The Internet's Discussion Network *
The fastest and easiest way to search and participate in Usenet - Free!


------------------------------

Subject: Re: Blowfish Claims ... something not right
From: tomstd <[EMAIL PROTECTED]>
Date: Mon, 22 May 2000 03:50:49 -0700

In article <[EMAIL PROTECTED]>, Bohdan Tashchuk
<[EMAIL PROTECTED]> wrote:
>Paul Rubin wrote:
>>
>> >8 cycles a byte with blowfish?  That's hard to believe,
mainly
>> >because the look ups in the F function would be a source of
>> >bottleneck.
>>
>> Wait a sec, I must have meant 8 cycles per *round*.  This is
coming
>> back to me.  That would be 16 cycles/byte.
>
>The value of 16 cycles/byte corresponds closely to the
performance claim by
>Schneier et all in the book they wrote called
>
>       The Twofish Encryption Algorithm
>       ISBN 0-471-35381-7.
>
>They give a lot of different values for various options of pre-
computing
>keys, various number of plaintext bytes, ASM versus C, etc. As
an
>asymptotic case see:
>
>       Table 5.3. Best Speed to Encrypt a Message
>       with a New 128-bit Key on a Pentium Pro
>
>They claim:
>
>       for 1 Megabyte of plaintext
>       Total Clocks per Byte is 16.1
>
>The key setup time isn't too bad, because:
>
>       for 1 Kilobyte of plaintext
>       Total Clocks per Byte is 24.5
>
>Put another way, an 800 MHz P3 processor is capable of
encrypting at
>roughly 50 Megabytes per second, which is pretty darn good IMHO.
>

On a 500mhz K7 I can encrypt at 43.84 mbyte/sec using my RC5
code. (bragging rights continue).  On a 800mhz K7 I could
encrypt at 70.15 mbytes/sec.  Which re-affirms my liking to RC5
(despite the nasty patent thing which doesn't apply to me, but I
respect anyways just to be civil).

I still wanna see this bf code :).  Does anyone have a copy?

Tom

* Sent from RemarQ http://www.remarq.com The Internet's Discussion Network *
The fastest and easiest way to search and participate in Usenet - Free!


------------------------------

Subject: Re: Matrix reduction
From: Chris Card <[EMAIL PROTECTED]>
Date: Mon, 22 May 2000 05:17:14 -0700

In article <8ga3go$gac$[EMAIL PROTECTED]>,
[EMAIL PROTECTED] (Scott Contini) wrote:
>In article <8g7sag$ruj$[EMAIL PROTECTED]>,
>Scott Contini <[EMAIL PROTECTED]> wrote:
>>In article <[EMAIL PROTECTED]>,
>>Chris Card  <[EMAIL PROTECTED]> wrote:
>>>I understand that the matrix reduction step of modern
>>>factorisation algorithms is usually done using the Block
Lanczos
>>>algorithm (I don't understand the algorithm yet though ...).
>>>Has anyone looked at the possibility of using hillclimbing
>>>methods to find a linear dependency in the matrix? I haven't
>>>given it much thought, but on the face of it there is a large
>>>solution space with many possible solutions and an obvious
metric
>>>(number of non-zero entries). Or is this complete rubbish? :-)
>>>
>>>Chris
>>>
>>>* Sent from RemarQ http://www.remarq.com The Internet's
Discussion Network *
>>>The fastest and easiest way to search and participate in
Usenet - Free!
>>>
>>
>>
>>I'm not an expert on all the tricks that can be used for matrix
>>reduction, but I have quite a bit of experience implementing
the
>>block Lanczos and Wiedemann algorithms and I can tell you the
>>benefits:
>>
>>For a matrix that is (approximately)  n  by  n  with an average
of
>>d  nonzero entries per row, the running time is  O(n^2 * d) .
Factoring
>>matrices are pretty sparse - usually  d  ranges from approx 10
to
>>approx 65
>>(see
http://www.maths.usyd.edu.au:8000/u/contini/factoring/FactorAnnou
ncements.html).
>>But you get one other nice benefit: there is a constant that
the
>>big O notation hides, which is  1/w  where  w  is the word size
>>of the computer.  Thus, for a computer with 64-bit words, you
>>can expect to solve the matrix in about  n^2 * d / 64
operations.
>>
>>The main disadvantage of block Lanczos is that there isn't a
great
>>way to parallelize it, or at least I'm not aware of one yet.  I
>>know many people are thinking about this subject, however!
>>
>
>David Molnar pointed out to me that this is not quite correct -
>Peter Montgomery recently gave a talk on how to implement block
>Lanczos on a mesh-connected hypercube.  Also, Arjen Lenstra and
>myself did a parallel implementation of both Block Lanczos and
>Block Wiedemann on a MasPar, but our implementations were
limited to
>factoring matrices with around 2 million rows and columns due to
>memory constraints (Peter Montgomery also helped with the MasPar
>block Lanczos implementation).
>
>Anyway, I'll rephrase my claim to: "parallel implementations of
>block Lanczos seem to require a tightly coupled network where
>communication is fast".  I think the best bet is to have several
>computers connected via a high speed switch.  But I am no longer
>up to date on the recent results, so Peter Montgomery is
probably
>the best person to comment on this.  Hopefully he is reading the
>newsgroup!
>
>Scott
>
[apologies if this appears twice ...]
Thanks for that Scott, that's very useful information, but it
doesn't actually answer my question :-)
My (half-baked) idea is to get a random linear combination of the
vectors, score it (score = number of non-zero entries), then
change the linear combination slightly looking for a lower score,
until local minimum is reached. Repeat until local minimum is
zero. From a space point-of-view this needs to store the matrix
once + the linear combination + a vector to keep track of
the linear combination, which doesn't sound too bad. I've no idea
of the chances of this method finding a solution or how long
it would take to do so.

Chris

* Sent from RemarQ http://www.remarq.com The Internet's Discussion Network *
The fastest and easiest way to search and participate in Usenet - Free!


------------------------------

From: [EMAIL PROTECTED] (Mark Wooding)
Subject: Re: Practical Blowfish Usage C++ Implementation
Date: 22 May 2000 16:16:12 GMT

[EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

> More alarming,  the ciphertext produced by the Encode() method,
> repeats and  does not appear random to me.

Blowfish is a 64-bit block cipher.  That is, it is a function which,
under the control of a key, takes 64-bit blocks and gives you other
64-bit blocks back.  It only works on 64-bit blocks.

Your message has the 64-bit block `12345678' repeated a number of
times.  Passing it to the cipher each time gives the same answer, so,
unsurprisingly, you get the same 64-bit ciphertexts back.

To see that this is true, use `123456712345671234...' as a test string
instead.  Since it repeats with period 56 bits, not 64 bits, you'll see
a nonrepeating pattern of ciphertext, at least for the first lcm(7, 8) =
56 blocks.  (Note that, usually, the world will explode if you give it a
plaintext which isn't a multiple of 64 bits in length.)

To make the problem go away once and for all, use cipher block chaining
(CBC), if you can.  Most implementations provide a CBC mode.  If not, I
suspect that both Tom St Denis's CryptoBag implementation (which I'm
sure he'll not hesitate to provide a reference to) and my own Catacomb
(http://www.excessus.demon.co.uk/misc-hacks/catacomb-1.0.0.tar.gz)
implement CBC.  Catacomb provides ciphertext stealing, which solves the
global detonation problem alluded to above -- it can cope gracefully
with oddly-sized plaintexts without resorting to kludges like padding.

-- [mdw]

------------------------------

From: Jerry Coffin <[EMAIL PROTECTED]>
Subject: Re: pentium timings
Date: Mon, 22 May 2000 10:18:50 -0600

In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] 
says...

[ ... ] 

> If you maintain a webpage, I suggest that you put it there for convenient
> access by other people.

I'm perfectly happy to provide it to anybody who wants it, but I 
don't maintain a web page.  Perhaps one of these days I'll get around 
to doing that, but no such thing exists at the present time.

-- 
    Later,
    Jerry.
 
The universe is a figment of its own imagination.

------------------------------

Subject: Re: Matrix reduction
From: Chris Card <[EMAIL PROTECTED]>
Date: Mon, 22 May 2000 02:50:58 -0700

In article <8ga3go$gac$[EMAIL PROTECTED]>,
[EMAIL PROTECTED] (Scott Contini) wrote:
>In article <8g7sag$ruj$[EMAIL PROTECTED]>,
>Scott Contini <[EMAIL PROTECTED]> wrote:
>>In article <[EMAIL PROTECTED]>,
>>Chris Card  <[EMAIL PROTECTED]> wrote:
>>>I understand that the matrix reduction step of modern
>>>factorisation algorithms is usually done using the Block
Lanczos
>>>algorithm (I don't understand the algorithm yet though ...).
>>>Has anyone looked at the possibility of using hillclimbing
>>>methods to find a linear dependency in the matrix? I haven't
>>>given it much thought, but on the face of it there is a large
>>>solution space with many possible solutions and an obvious
metric
>>>(number of non-zero entries). Or is this complete rubbish? :-)
>>>
>>>Chris
>>>
>>>* Sent from RemarQ http://www.remarq.com The Internet's
Discussion Network *
>>>The fastest and easiest way to search and participate in
Usenet - Free!
>>>
>>
>>
>>I'm not an expert on all the tricks that can be used for matrix
>>reduction, but I have quite a bit of experience implementing
the
>>block Lanczos and Wiedemann algorithms and I can tell you the
>>benefits:
>>
>>For a matrix that is (approximately)  n  by  n  with an average
of
>>d  nonzero entries per row, the running time is  O(n^2 * d) .
Factoring
>>matrices are pretty sparse - usually  d  ranges from approx 10
to
>>approx 65
>>(see
http://www.maths.usyd.edu.au:8000/u/contini/factoring/FactorAnnou
ncements.html).
>>But you get one other nice benefit: there is a constant that
the
>>big O notation hides, which is  1/w  where  w  is the word size
>>of the computer.  Thus, for a computer with 64-bit words, you
>>can expect to solve the matrix in about  n^2 * d / 64
operations.
>>
>>The main disadvantage of block Lanczos is that there isn't a
great
>>way to parallelize it, or at least I'm not aware of one yet.  I
>>know many people are thinking about this subject, however!
>>
>
>David Molnar pointed out to me that this is not quite correct -
>Peter Montgomery recently gave a talk on how to implement block
>Lanczos on a mesh-connected hypercube.  Also, Arjen Lenstra and
>myself did a parallel implementation of both Block Lanczos and
>Block Wiedemann on a MasPar, but our implementations were
limited to
>factoring matrices with around 2 million rows and columns due to
>memory constraints (Peter Montgomery also helped with the MasPar
>block Lanczos implementation).
>
>Anyway, I'll rephrase my claim to: "parallel implementations of
>block Lanczos seem to require a tightly coupled network where
>communication is fast".  I think the best bet is to have several
>computers connected via a high speed switch.  But I am no longer
>up to date on the recent results, so Peter Montgomery is
probably
>the best person to comment on this.  Hopefully he is reading the
>newsgroup!
>
>Scott
Thanks for that info Scott, that's very useful, but it didn't
actually answer my question :-)
The (half-baked) idea I had was to get a random linear
combination of the vectors, score it (= number of non-zero
elements) and then slighty change the combination looking for
lower scores, until a local minimum is reached. Repeat until
local minimum = zero. From the point of view of memory
usage, this only needs the hold the matrix once + the linear
combination + a vector recording the elements in the linear
combination, which doesn't sound too bad. But I've no idea
how likely this approach is to get a linear dependency, and how
long it's likely to take.

Chris

* Sent from RemarQ http://www.remarq.com The Internet's Discussion Network *
The fastest and easiest way to search and participate in Usenet - Free!


------------------------------

From: [EMAIL PROTECTED] (Mark Wooding)
Crossposted-To: sci.math
Subject: Re: More on Pi and randomness
Date: 22 May 2000 16:20:43 GMT

Mok-Kong Shen <[EMAIL PROTECTED]> wrote:

> I remain to believe that on heuristic grounds one could well pass a
> segment of Pi (with a randomly chosen starting point) through a
> sufficiently good encryption algorithm (which mixes up the input)
> and obtain something good enough to be considered random
> for all practical applications.

Surely this is a stone soup.

-- [mdw]

------------------------------

From: [EMAIL PROTECTED] (Mark Wooding)
Subject: Re: AEES-Cascade
Date: 22 May 2000 16:38:38 GMT

tomstd <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] (David A. Wagner) wrote:
>
> >In any case, don't rely on statistical tests for much.  They can
> >sometimes uncover massive flaws in ciphers, but rarely discover
> >anything short of enormous weaknesses, so passing statistical tests
> >is hardly a reliable indicator of cryptographic strength.
> 
> Technically this last paragraph is wrong.

No, it's not.

> Things like differential and linear attacks are nothing more then
> specific statistical anamolalies that are exploited.

(For future reference: `anomalies'.)

This is true.  However, statistical testing, in the usual sense, won't
uncover these weaknesses.  They're discovered by thinking very hard.

> I agree passing simple 1/0 balance tests is not enough, but
> those itterative attacks are nothing more then statistical
> attacks.

There's a difference between a statistical attack, which exploits a
known statistical problem, and a statistical test which seeks to find
out whether a cipher *has* a particular (or general) problem.  The
latter is unlikely to find anything of interest even if there is a real
problem.

-- [mdw]

------------------------------


** FOR YOUR REFERENCE **

The service address, to which questions about the list itself and requests
to be added to or deleted from it should be directed, is:

    Internet: [EMAIL PROTECTED]

You can send mail to the entire list (and sci.crypt) via:

    Internet: [EMAIL PROTECTED]

End of Cryptography-Digest Digest
******************************

Reply via email to