Cryptography-Digest Digest #832

Digestifier Sun, 21 May 2000 19:09:41 -0700
Cryptography-Digest Digest #832, Volume #11      Sun, 21 May 00 22:13:00 EDT

Contents:
  Re: Blowfish Claims ... something not right (tomstd)
  Re: Blowfish Claims ... something not right (Paul Rubin)
  Re: Blowfish Claims ... something not right (tomstd)
  Re: Blowfish Claims ... something not right (Paul Rubin)
  Re: pentium timings (Paul Schlyter)
  Re: pentium timings (tomstd)
  Re: Blowfish Claims ... something not right (tomstd)
  Re: Reasonably secure OTP passing (John Savard)
  conceptually simple cipher (tomstd)
  Blowfish again.... (tomstd)
  Re: AEES-Cascade (David A. Wagner)
  Re: Matrix reduction (Scott Contini)
  Re: Blowfish again.... ("Adam Durana")
  Re: AEES-Cascade (David A. Wagner)
  Re: Interpretation of Hitachi patent claims (Jerry Coffin)
  Re: AEES-Cascade (tomstd)
  Re: Blowfish again.... (tomstd)
  Re: pentium timings (Jerry Coffin)
  Re: pentium timings (tomstd)
  Re: conceptually simple cipher (tomstd)

----------------------------------------------------------------------------

Subject: Re: Blowfish Claims ... something not right
From: tomstd <[EMAIL PROTECTED]>
Date: Sun, 21 May 2000 15:18:26 -0700

In article <y5ZV4.2188
$[EMAIL PROTECTED]>, "Kasper Pedersen"
<[EMAIL PROTECTED]> wrote:
>"tomstd" <[EMAIL PROTECTED]> wrote in message
>news:[EMAIL PROTECTED]...
>> According to [1] Blowfish on a pentium pro takes 9 cycles per
>> round.  I would really like to see that code that runs in 9
>> cycles, that does three xors, 1 add and 5 lookups.
>>
>> Somehow either the guy is a genius, or he is lying.
>
>Nope. 9 clocks means 18 execution opportunities on the pentium-
1. It _just_
>fits in 9
>clocks. I tried, and it really works.

Is the code on the net anywhere?

>on the K6/k6-2/-3 and P2/3 it's slightly slower because of the
architecture
>(they sacrificed something to get higher clock frequencies)
>Mine runs at 11 clocks/round on the K6-2, 13 on the P2, 12 on
the P3.
>I didn't have a P2/P3 to develop on, so it might be possible to
squeeze a
>bit more.
>The Athlon does it in 25% less cycles than the P3, which
translates to a 33%
>advantage.

So it's about ~22 cycles a byte on a K6/P2/P3?, well my RC5 code
kicks that hehehehe nyanyanya.

I still would like to see that code if you don't mind.

Tom


* Sent from RemarQ http://www.remarq.com The Internet's Discussion Network *
The fastest and easiest way to search and participate in Usenet - Free!


------------------------------

From: [EMAIL PROTECTED] (Paul Rubin)
Subject: Re: Blowfish Claims ... something not right
Date: 21 May 2000 22:31:56 GMT

In article <[EMAIL PROTECTED]>,
tomstd  <[EMAIL PROTECTED]> wrote:
>So it's about ~22 cycles a byte on a K6/P2/P3?, well my RC5 code
>kicks that hehehehe nyanyanya.
>
>I still would like to see that code if you don't mind.

I think I got around 8 cycles/byte on a P1 by overlapping the
instructions carefully, and using a memory cell (L1 cache) in some
place where it really looked like you wanted a register, but using
memory turned out to save a cycle.  I'll see if I can find the code.

You should be able to do at least as well on a K6/P2/P3, all of which
have more execution units and fewer interlocks than the P1.  The problem
is that optimizing code for the P1 tends to pessimize it on the K6/P[23].
For absolute best performance, you need separate implementations for each
processor.

Some good books on x86 optimization up to the P1:

  The Zen of Assembly Language Programming, by Michael Abrash;
  Pentium Processor Optimization Tools, by Michael Schmit

I may have slightly garbled the titles but you should be able to find
both books on Amazon.  Unfortunately they don't cover the K6/P[23]
(Note, P2 and P3 are basically the same).

------------------------------

Subject: Re: Blowfish Claims ... something not right
From: tomstd <[EMAIL PROTECTED]>
Date: Sun, 21 May 2000 15:51:38 -0700

In article <8g9o4s$kbp$[EMAIL PROTECTED]>,
[EMAIL PROTECTED] (Paul Rubin) wrote:
>In article <[EMAIL PROTECTED]>,
>tomstd  <[EMAIL PROTECTED]> wrote:
>>So it's about ~22 cycles a byte on a K6/P2/P3?, well my RC5
code
>>kicks that hehehehe nyanyanya.
>>
>>I still would like to see that code if you don't mind.
>
>I think I got around 8 cycles/byte on a P1 by overlapping the
>instructions carefully, and using a memory cell (L1 cache) in
some
>place where it really looked like you wanted a register, but
using
>memory turned out to save a cycle.  I'll see if I can find the
code.

8 cycles a byte with blowfish?  That's hard to believe, mainly
because the look ups in the F function would be a source of
bottleneck.

>You should be able to do at least as well on a K6/P2/P3, all of
which
>have more execution units and fewer interlocks than the P1.
The problem
>is that optimizing code for the P1 tends to pessimize it on the
K6/P[23].
>For absolute best performance, you need separate
implementations for each
>processor.

Well dig up the code, I will try it out.

>Some good books on x86 optimization up to the P1:
>
>  The Zen of Assembly Language Programming, by Michael Abrash;
>  Pentium Processor Optimization Tools, by Michael Schmit
>
>I may have slightly garbled the titles but you should be able
to find
>both books on Amazon.  Unfortunately they don't cover the K6/P
[23]
>(Note, P2 and P3 are basically the same).

And there are texts on pentium optimizations on www.x86.org.
However I have already begun todo some ciphers in asm, I have
rc5 and xtea already.  I planned todo blowfish, but it's already
done, supposedly.

Tom


* Sent from RemarQ http://www.remarq.com The Internet's Discussion Network *
The fastest and easiest way to search and participate in Usenet - Free!


------------------------------

From: [EMAIL PROTECTED] (Paul Rubin)
Subject: Re: Blowfish Claims ... something not right
Date: 22 May 2000 00:05:39 GMT

tomstd  <[EMAIL PROTECTED]> wrote:
>>I think I got around 8 cycles/byte on a P1 ...
>
>8 cycles a byte with blowfish?  That's hard to believe, mainly
>because the look ups in the F function would be a source of
>bottleneck.

Wait a sec, I must have meant 8 cycles per *round*.  This is coming
back to me.  That would be 16 cycles/byte.  Oops.  But the table
lookups should each take 1 cycle of 1 pipeline, i.e. 1/2 a clock, if
you overlap other operations carefully.  Since there are 4 lookups per
round that is 4*16/8 = 8 cycles per byte spent in the cipher doing
lookups.  So that leaves another 8 cycles for the other operations,
which isn't so bad.

>And there are texts on pentium optimizations on www.x86.org.
>However I have already begun todo some ciphers in asm, I have
>rc5 and xtea already.  I planned todo blowfish, but it's already
>done, supposedly.

I think RC5 is not very interesting because of the data dependent
rotations (slow operation on any cpu with no barrel shifter) and
because of the patent.  For XTea, it's more in the spirit of the
design to go for minimum code size and ram usage, rather than speed.
DES/3DES have been done to death; Blowfish and the AES candidates are
also done.  You might try CAST-128.  The round function should be the
same as Blowfish but you could optimize the key schedule.  You could
also code XTea and Skipjack for that Atmel microcontroller, going for
a good speed-space tradeoff.

Another useful thing would be some very compact public-key
implementations for the x86 (DH to start with, RSA when the patent
expires).  Asm code is important for public key algorithms because
double-width multiplies speed things up a lot and don't exist in
portable C.  Most current implementations either are written in C for
portability and clarity and are dog slow, or else go all-out for speed
and are large.

If you write something straightforward and compact, it can be very
useful.  Don't try to squeeze out every byte, but just don't bother
with the space-consuming fancy tricks of the ultra-high-speed
implementations.

------------------------------

From: [EMAIL PROTECTED] (Paul Schlyter)
Subject: Re: pentium timings
Date: 22 May 2000 01:52:27 +0200

In article <[EMAIL PROTECTED]>,
Mok-Kong Shen  <[EMAIL PROTECTED]> wrote:
 
> tomstd wrote:
> 
>> Anyone interested in timing their code in cycles on a pentium
>> class computer (i.e k6, 586, ppro, MII, etc...) can use the code
>> I developped (well it's not my idea, I just put it together
>> nicely) here
> 
> A question of curiosity: How do you get the time to sufficient accuracy?
> On a computer, e.g. PC, there are other tasks running concurrently.
 
Perhaps by turning off interrupts while the code to be benchmarked runs?
Of course this requires that the code runs in Ring 0.
 
-- 
================================================================
Paul Schlyter,  Swedish Amateur Astronomer's Society (SAAF)
Grev Turegatan 40,  S-114 38 Stockholm,  SWEDEN
e-mail:  pausch at saaf dot se   or    paul.schlyter at ausys dot se
WWW:     http://hotel04.ausys.se/pausch    http://welcome.to/pausch

------------------------------

Subject: Re: pentium timings
From: tomstd <[EMAIL PROTECTED]>
Date: Sun, 21 May 2000 17:45:56 -0700

In article <8g9srr$m34$[EMAIL PROTECTED]>, [EMAIL PROTECTED]
(Paul Schlyter) wrote:
>In article <[EMAIL PROTECTED]>,
>Mok-Kong Shen  <[EMAIL PROTECTED]> wrote:
>
>> tomstd wrote:
>>
>>> Anyone interested in timing their code in cycles on a pentium
>>> class computer (i.e k6, 586, ppro, MII, etc...) can use the
code
>>> I developped (well it's not my idea, I just put it together
>>> nicely) here
>>
>> A question of curiosity: How do you get the time to
sufficient accuracy?
>> On a computer, e.g. PC, there are other tasks running
concurrently.
>
>Perhaps by turning off interrupts while the code to be
benchmarked runs?
>Of course this requires that the code runs in Ring 0.

In win98 I do get a variance (which is why i take averages) but
I find if I use 'cli/sti' around the body of the timer code that
it hogs the cpu big time... (i.e other tasks just stop).  Which
leads me to think Win98 allows dpmi tasks to control the cpu
flags... I wonder..
mov eax,cr0
and eax,~1
mov cr0,eax

hehehehe..

Tom


* Sent from RemarQ http://www.remarq.com The Internet's Discussion Network *
The fastest and easiest way to search and participate in Usenet - Free!


------------------------------

Subject: Re: Blowfish Claims ... something not right
From: tomstd <[EMAIL PROTECTED]>
Date: Sun, 21 May 2000 17:51:48 -0700

In article <8g9tkj$l4o$[EMAIL PROTECTED]>,
[EMAIL PROTECTED] (Paul Rubin) wrote:
>tomstd  <[EMAIL PROTECTED]> wrote:
>>>I think I got around 8 cycles/byte on a P1 ...
>>
>>8 cycles a byte with blowfish?  That's hard to believe, mainly
>>because the look ups in the F function would be a source of
>>bottleneck.
>
>Wait a sec, I must have meant 8 cycles per *round*.  This is
coming
>back to me.  That would be 16 cycles/byte.  Oops.  But the table
>lookups should each take 1 cycle of 1 pipeline, i.e. 1/2 a
clock, if
>you overlap other operations carefully.  Since there are 4
lookups per
>round that is 4*16/8 = 8 cycles per byte spent in the cipher
doing
>lookups.  So that leaves another 8 cycles for the other
operations,
>which isn't so bad.

That sounds a bit better :)

>>And there are texts on pentium optimizations on www.x86.org.
>>However I have already begun todo some ciphers in asm, I have
>>rc5 and xtea already.  I planned todo blowfish, but it's
already
>>done, supposedly.
>
>I think RC5 is not very interesting because of the data
dependent
>rotations (slow operation on any cpu with no barrel shifter) and
>because of the patent.  For XTea, it's more in the spirit of the
>design to go for minimum code size and ram usage, rather than
speed.
>DES/3DES have been done to death; Blowfish and the AES
candidates are
>also done.  You might try CAST-128.  The round function should
be the
>same as Blowfish but you could optimize the key schedule.  You
could
>also code XTea and Skipjack for that Atmel microcontroller,
going for
>a good speed-space tradeoff.

RC5 was fun todo, just to prove it can be done fast on the
lastest x86 systems.  Almost everyone I know has a K6-2/PII or
above (I have a quote for K6-2 at 70$).  So I decided todo it.
15 cycles/byte is rather quick for a block cipher, espescially
on the K7 where I get 11 cycles/byte...

>Another useful thing would be some very compact public-key
>implementations for the x86 (DH to start with, RSA when the
patent
>expires).  Asm code is important for public key algorithms
because
>double-width multiplies speed things up a lot and don't exist in
>portable C.  Most current implementations either are written in
C for
>portability and clarity and are dog slow, or else go all-out
for speed
>and are large.
>
>If you write something straightforward and compact, it can be
very
>useful.  Don't try to squeeze out every byte, but just don't
bother
>with the space-consuming fancy tricks of the ultra-high-speed
>implementations.

How about a DJGPP callable long int math library?  Sounds like
fun.

Tom



* Sent from RemarQ http://www.remarq.com The Internet's Discussion Network *
The fastest and easiest way to search and participate in Usenet - Free!


------------------------------

From: [EMAIL PROTECTED] (John Savard)
Subject: Re: Reasonably secure OTP passing
Date: Mon, 22 May 2000 01:06:25 GMT

On Sun, 21 May 2000 22:32:25 +0200, Mok-Kong Shen
<[EMAIL PROTECTED]> wrote, in part:

>I think that within the framework of practical requirement, i.e. we
>don't need 'absolute' security, the two OTPs could be replaced
>by something that do not exactly qualify being (ideal) OTP.

Then one doesn't need to bother sending it, one can just use this
other thing as a normal stream cipher component. We still are falling
short of a true OTP, since the random keystream is being transmitted
in a way that is subject to interception and (when correlated with the
regular messages with which it is used) cryptanalysis.

I think the question of using true random numbers for message
splitting, and under what circumstances that produces some genuine
improvement in security, is one worth looking into.

John Savard (teneerf <-)
http://www.ecn.ab.ca/~jsavard/

------------------------------

Subject: conceptually simple cipher
From: tomstd <[EMAIL PROTECTED]>
Date: Sun, 21 May 2000 18:22:00 -0700

I did a conceptually simple (albeit not too fast) cipher as the
following

(a, b) = plaintext
for r = 0 to 15 (inclusive) do
   a = a + sbox[(r + (b mod 257)) mod 256]
   swap(a, b)
next r

Where 'sbox' is a 256 element 32-bit word private array.

Any comments or ideas?

Please and thankyou.

Tom

* Sent from RemarQ http://www.remarq.com The Internet's Discussion Network *
The fastest and easiest way to search and participate in Usenet - Free!


------------------------------

Subject: Blowfish again....
From: tomstd <[EMAIL PROTECTED]>
Date: Sun, 21 May 2000 18:26:17 -0700

Not to be mean, but where exactly is Bruce's analysis of
Blowfish?  He did extensive attacks on Twofish, but nothing on
Blowfish?

I don't want to attack him (so this is not to be rude).  I just
want to know if he has any further insight into the cipher (6
years later....)

Thanks,
Tom

* Sent from RemarQ http://www.remarq.com The Internet's Discussion Network *
The fastest and easiest way to search and participate in Usenet - Free!


------------------------------

From: [EMAIL PROTECTED] (David A. Wagner)
Subject: Re: AEES-Cascade
Date: 21 May 2000 18:40:48 -0700

In article <8fu7nf$39f$[EMAIL PROTECTED]>,  <[EMAIL PROTECTED]> wrote:
> Performance of AEES-Cascade excluding I/O operations was measured
> sending for encryption 410Mb in a loop. It is  931 Kb/sec.

You realize that this is significantly slower than 3DES, I hope?

------------------------------

From: [EMAIL PROTECTED] (Scott Contini)
Subject: Re: Matrix reduction
Date: 22 May 2000 01:46:00 GMT

In article <8g7sag$ruj$[EMAIL PROTECTED]>,
Scott Contini <[EMAIL PROTECTED]> wrote:
>In article <[EMAIL PROTECTED]>,
>Chris Card  <[EMAIL PROTECTED]> wrote:
>>I understand that the matrix reduction step of modern
>>factorisation algorithms is usually done using the Block Lanczos
>>algorithm (I don't understand the algorithm yet though ...).
>>Has anyone looked at the possibility of using hillclimbing
>>methods to find a linear dependency in the matrix? I haven't
>>given it much thought, but on the face of it there is a large
>>solution space with many possible solutions and an obvious metric
>>(number of non-zero entries). Or is this complete rubbish? :-)
>>
>>Chris
>>
>>* Sent from RemarQ http://www.remarq.com The Internet's Discussion Network *
>>The fastest and easiest way to search and participate in Usenet - Free!
>>
>
>
>I'm not an expert on all the tricks that can be used for matrix
>reduction, but I have quite a bit of experience implementing the
>block Lanczos and Wiedemann algorithms and I can tell you the
>benefits:
>
>For a matrix that is (approximately)  n  by  n  with an average of
>d  nonzero entries per row, the running time is  O(n^2 * d) .  Factoring
>matrices are pretty sparse - usually  d  ranges from approx 10 to
>approx 65 
>(see http://www.maths.usyd.edu.au:8000/u/contini/factoring/FactorAnnouncements.html).
>But you get one other nice benefit: there is a constant that the
>big O notation hides, which is  1/w  where  w  is the word size
>of the computer.  Thus, for a computer with 64-bit words, you
>can expect to solve the matrix in about  n^2 * d / 64  operations.
>
>The main disadvantage of block Lanczos is that there isn't a great
>way to parallelize it, or at least I'm not aware of one yet.  I
>know many people are thinking about this subject, however!
>

David Molnar pointed out to me that this is not quite correct -
Peter Montgomery recently gave a talk on how to implement block
Lanczos on a mesh-connected hypercube.  Also, Arjen Lenstra and
myself did a parallel implementation of both Block Lanczos and
Block Wiedemann on a MasPar, but our implementations were limited to
factoring matrices with around 2 million rows and columns due to
memory constraints (Peter Montgomery also helped with the MasPar
block Lanczos implementation).

Anyway, I'll rephrase my claim to: "parallel implementations of
block Lanczos seem to require a tightly coupled network where
communication is fast".  I think the best bet is to have several
computers connected via a high speed switch.  But I am no longer
up to date on the recent results, so Peter Montgomery is probably
the best person to comment on this.  Hopefully he is reading the
newsgroup!

Scott




------------------------------

From: "Adam Durana" <[EMAIL PROTECTED]>
Subject: Re: Blowfish again....
Date: Sun, 21 May 2000 21:42:21 -0400

> I don't want to attack him (so this is not to be rude).  I just
> want to know if he has any further insight into the cipher (6
> years later....)

Or just the analysis from when he first published the cipher.  I am sure he
did analysis, it would be nice if he included that on his blowfish page on
the web site.

- Adam



------------------------------

From: [EMAIL PROTECTED] (David A. Wagner)
Subject: Re: AEES-Cascade
Date: 21 May 2000 18:46:11 -0700

In article <8fu7nf$39f$[EMAIL PROTECTED]>,  <[EMAIL PROTECTED]> wrote:
> A desirable property of any encryption algorithm is that a small
> change in either the plaintext or the key should produce a
> significant change in the ciphertext. For 64-bit block DES change
> only one bit gives 34 bit change in ciphertext.

That would be a serious weakness in DES if it were true,
but I think the error is in your testing methodology.
Increase the number of trials you do by about 100x, and
see if that number doesn't get much closer to 32 bits.

> To be able to compare AEES-Cascade implementation (640-bit) with
> DES (64-bit) we should change the same amount of information namely
> 10 bits. Two ciphertexts encrypted with AEES-Cascade differ in 303
> bits, which makes 47,3%.

Again, if this were the true rate, this would be a serious
weakness in AEES-Cascade, but I suspect you are again not
running the test on enough plaintext-pairs.

> With two keys that differ in only one bit position in DES
> we have about 50% of the bits in the ciphertext differ.
> In AEES we are using 2048 bits key. To be able to compare
> key avalanche effect with DES we should change 37 bits.

I hate to say it, but this paragraph reflects a serious
misunderstanding of the point of the avalanche test....
You could try it with changing one bit of the AEES key
and your results would be more meaningful.


In any case, don't rely on statistical tests for much.
They can sometimes uncover massive flaws in ciphers, but
rarely discover anything short of enormous weaknesses, so
passing statistical tests is hardly a reliable indicator
of cryptographic strength.

------------------------------

From: Jerry Coffin <[EMAIL PROTECTED]>
Subject: Re: Interpretation of Hitachi patent claims
Date: Sun, 21 May 2000 19:50:07 -0600

In article <[EMAIL PROTECTED]>, [EMAIL PROTECTED] 
says...

[ I had asked about the numbers of RSA's patents on RC5 ] 

Looking at things, it appears that they're 5,835,600 and 5,724,428.  
It's possible that the Hitachi patents invalidate the RSA DSI 
patents, but not the reverse -- the Hitachi patents are substantially 
older (both were issued before either of the RC5 patents was applied 
for).  I doubt the Hitachi patents affect the RC5 patents though: the 
RC5 patents specifically call for _data dependent_ rotations, which 
is what appears to be new and unique.

It could be that you have to use the Hitachi patents to implement 
RC5, but I haven't tried to figure out for sure yet...

-- 
    Later,
    Jerry.
 
The universe is a figment of its own imagination.

------------------------------

Subject: Re: AEES-Cascade
From: tomstd <[EMAIL PROTECTED]>
Date: Sun, 21 May 2000 18:54:30 -0700

In article <8ga3h3$58q$[EMAIL PROTECTED]>,
[EMAIL PROTECTED] (David A. Wagner) wrote:
>In article <8fu7nf$39f$[EMAIL PROTECTED]>,  <aernst331@my-
deja.com> wrote:
>> A desirable property of any encryption algorithm is that a
small
>> change in either the plaintext or the key should produce a
>> significant change in the ciphertext. For 64-bit block DES
change
>> only one bit gives 34 bit change in ciphertext.
>
>That would be a serious weakness in DES if it were true,
>but I think the error is in your testing methodology.
>Increase the number of trials you do by about 100x, and
>see if that number doesn't get much closer to 32 bits.
>
>> To be able to compare AEES-Cascade implementation (640-bit)
with
>> DES (64-bit) we should change the same amount of information
namely
>> 10 bits. Two ciphertexts encrypted with AEES-Cascade differ
in 303
>> bits, which makes 47,3%.
>
>Again, if this were the true rate, this would be a serious
>weakness in AEES-Cascade, but I suspect you are again not
>running the test on enough plaintext-pairs.
>
>> With two keys that differ in only one bit position in DES
>> we have about 50% of the bits in the ciphertext differ.
>> In AEES we are using 2048 bits key. To be able to compare
>> key avalanche effect with DES we should change 37 bits.
>
>I hate to say it, but this paragraph reflects a serious
>misunderstanding of the point of the avalanche test....
>You could try it with changing one bit of the AEES key
>and your results would be more meaningful.
>
>
>In any case, don't rely on statistical tests for much.
>They can sometimes uncover massive flaws in ciphers, but
>rarely discover anything short of enormous weaknesses, so
>passing statistical tests is hardly a reliable indicator
>of cryptographic strength.

Technically this last paragraph is wrong.  Things like
differential and linear attacks are nothing more then specific
statistical anamolalies that are exploited.

I agree passing simple 1/0 balance tests is not enough, but
those itterative attacks are nothing more then statistical
attacks.

Tom


* Sent from RemarQ http://www.remarq.com The Internet's Discussion Network *
The fastest and easiest way to search and participate in Usenet - Free!


------------------------------

Subject: Re: Blowfish again....
From: tomstd <[EMAIL PROTECTED]>
Date: Sun, 21 May 2000 18:55:28 -0700

In article <0f0W4.2416$[EMAIL PROTECTED]>, "Adam
Durana" <[EMAIL PROTECTED]> wrote:
>> I don't want to attack him (so this is not to be rude).  I
just
>> want to know if he has any further insight into the cipher (6
>> years later....)
>
>Or just the analysis from when he first published the cipher.
I am sure he
>did analysis, it would be nice if he included that on his
blowfish page on
>the web site.

He should have included it then.

Tom


* Sent from RemarQ http://www.remarq.com The Internet's Discussion Network *
The fastest and easiest way to search and participate in Usenet - Free!


------------------------------

From: Jerry Coffin <[EMAIL PROTECTED]>
Subject: Re: pentium timings
Date: Sun, 21 May 2000 20:00:48 -0600

In article <[EMAIL PROTECTED]>, 
[EMAIL PROTECTED] says...
> Anyone interested in timing their code in cycles on a pentium
> class computer (i.e k6, 586, ppro, MII, etc...) can use the code
> I developped (well it's not my idea, I just put it together
> nicely) here

Unfortunately, you didn't put it together particularly well at all.  
RDTSC is NOT a serializing instruction, which means it can be 
executed out of order on a K5 or above or a Pentium Pro or above.  
That means it could execute part of the code you're timing before it 
executes the first RDTSC and/or could execute the final RDTSC before 
it's finished executing all the code you're trying to time.  To make 
a long story short, the code will produce inconsistent and inaccurate 
results.

If you want some timing code that actually works, I'll be happy to 
send it to you.  Alternatively, I'm pretty sure I've posted it to 
comp.lang.asm.x86, so a search on deja.com should turn it up.  Much 
like cryptography, making this work correctly can be substantially 
more work than most people expect.

-- 
    Later,
    Jerry.
 
The universe is a figment of its own imagination.

------------------------------

Subject: Re: pentium timings
From: tomstd <[EMAIL PROTECTED]>
Date: Sun, 21 May 2000 19:06:08 -0700

In article <[EMAIL PROTECTED]>,
Jerry Coffin <[EMAIL PROTECTED]> wrote:
>In article <[EMAIL PROTECTED]>,
>[EMAIL PROTECTED] says...
>> Anyone interested in timing their code in cycles on a pentium
>> class computer (i.e k6, 586, ppro, MII, etc...) can use the
code
>> I developped (well it's not my idea, I just put it together
>> nicely) here
>
>Unfortunately, you didn't put it together particularly well at
all.
>RDTSC is NOT a serializing instruction, which means it can be
>executed out of order on a K5 or above or a Pentium Pro or
above.
>That means it could execute part of the code you're timing
before it
>executes the first RDTSC and/or could execute the final RDTSC
before
>it's finished executing all the code you're trying to time.  To
make
>a long story short, the code will produce inconsistent and
inaccurate
>results.
>
>If you want some timing code that actually works, I'll be happy
to
>send it to you.  Alternatively, I'm pretty sure I've posted it
to
>comp.lang.asm.x86, so a search on deja.com should turn it up.
Much
>like cryptography, making this work correctly can be
substantially
>more work than most people expect.

I will go ahead and disagree with you now.  When timing my RC5
code I always get the same clocks per block output, no matter
what is running.

What you have todo is call 'timer(void *)' more then once and
take the average to get a good idea of what's going on.

Even still if the rdtsc is part of the start/ending of the
tested code, that's why it's called 512 times.  So the majority
of the used cycles is in the called code and not the timer code.

This code is for estimating and profiling, not for exact
measurements.

Tom


* Sent from RemarQ http://www.remarq.com The Internet's Discussion Network *
The fastest and easiest way to search and participate in Usenet - Free!


------------------------------

Subject: Re: conceptually simple cipher
From: tomstd <[EMAIL PROTECTED]>
Date: Sun, 21 May 2000 19:07:43 -0700

In article <[EMAIL PROTECTED]>, tomstd
<[EMAIL PROTECTED]> wrote:
>I did a conceptually simple (albeit not too fast) cipher as the
>following
>
>(a, b) = plaintext
>for r = 0 to 15 (inclusive) do
>   a = a + sbox[(r + (b mod 257)) mod 256]
>   swap(a, b)
>next r
>

I changed it to use an array S[0..256+15] and did this

for r = 0 to 15 do
    a = a + sbox[((sbox[256+r] + b) mod 257) mod 256]
    swap(a, b)
next r

Source code is at

http://www.tomstdenis.com/cipher3.c

Tom


* Sent from RemarQ http://www.remarq.com The Internet's Discussion Network *
The fastest and easiest way to search and participate in Usenet - Free!


------------------------------


** FOR YOUR REFERENCE **

The service address, to which questions about the list itself and requests
to be added to or deleted from it should be directed, is:

    Internet: [EMAIL PROTECTED]

You can send mail to the entire list (and sci.crypt) via:

    Internet: [EMAIL PROTECTED]

End of Cryptography-Digest Digest
******************************
Cryptography-Digest Digest #832

Reply via email to