Several recent discussion -- on the lists and privat -- have brought  
up the question of how to best parametrize tables. I'd like to provide  
our current approach here and would appreciate a discussion.

+++ Constraints +++

We want to store a code book with quintillions of entries in a  
compressed form while not exceeding several constraints:

A) Computational power -- At most Trillions of computations for each  
decode.
B) Storage -- At most Trillions of data words of storage.
C) Disk Seeks -- At most Tens of Million seeks for each decode.

Besides these physical limits, tables are limited by

D) Mergers -- Collisions are unavoidable but only cause damage when  
they appear under the same function

+++ Optimizations +++

Two optimization techniques exist to remove the major bottle-necks of  
plain TMTO tables:

1) 'Distinguished Points' decrease C). DP tables still have too many  
mergers which bounds the maximum size of each table. Smaller tables  
are undesirable as splitting the data into 'n' different table  
increases the need for A) by a factor of 'n'.

2) 'Rainbow Tables' decrease D) and A) up to some point because tables  
can be larger. After that point adding more 'colors' to the rainbow  
increase A).

Combining 1) and 2) is optimal for our application. So far we used 32  
'rainbow colors' each ending in 15 bit distinguished point. We might  
want to change that after the recent discoveries to 12 or 13 bits.  
Here is why:

+++ Parameters +++

Full calculations are in this Excel sheet: 
http://reflextor.com/Rainbow.Tables.xls

Assume we need to cover 1% of the key space of 2^61.26 (after the  
recent break-through).

Number of tables * tables height * colors per row * points per color =  
1% * 2^61.26
or:  n  * h = 2^54.62

These values should be stored compressed in 2 TB. That means:

Number of tables * tables height = 2 TB
or:  n  * h * k * 2^l = 2^37
=> k * l = 2^17.62

We want 'n' to be as small as possible but are bound by mergers (which  
occur more frequently as the tables grow). The point at which c% of  
new chains are thrown away is at:

(1 - h*l^2 / Z)^k = (1 - c)
=> h*l^2 = Z (1 - (1-c)^(1/k) )

Where 'Z' is the collision probability of A5/1, which I conservatively  
assume to be (1/2)*2^-61.26. Setting 'c' at 50%, we get solution  
triples of <h,l,n> for each number of colors 'k' we chose. These  
differ in requirements GPU and HDD requirements.

Computations:  T= n * k(k+1)/2 * l  (for each try, hundreds of which  
will be needed)
HDD seeks: H = n * k   (for each try)

+++ Results +++

k=  8;  l = 2^14.62;  h= 2^27.43  n = 2^9.57
T = 2^37.70; H = 20.91

k= 32;  l = 2^12.62;  h= 2^29.48;  n = 2^7.52
T = 2^37.25; H = 20.86

The difference between these choices is small, so the exact number of  
'colors' is not crucial. The speed-up over not using rainbow colors is  
3 in this case and much larger when using a larger key space (as  
before) or less storage.

Play with these values in the Excel linked above. For example,  
increase 'C' to simulate the effect of spending more computational  
time upfront.

Looking forward to the discussion! 
_______________________________________________
A51 mailing list
[email protected]
http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51

Reply via email to