Re: [A51] 5.6x efficiency increase by removing illegal states

M vd S Mon, 11 Jan 2010 08:07:09 -0800

On 1/11/10 5:09 AM, sascha wrote:
> On Mon, Jan 11, 2010 at 12:11:53AM +0100, M vd S wrote:
>   
>> On 1/10/10 8:41 PM, sascha wrote:
>>     
>>> On Sun, Jan 10, 2010 at 05:15:46PM +0100, M vd S wrote:
>>>       
>>>> Apart from the optimization, we can also observe that the state
>>>> space converges to 2^64 * (1-84%) before generating the keystream.
>>>> This must also mean that only 16% of 2^64 possible 64 bit keystream
>>>> can exist. I wouldn't know how this can be of value to us right now,
>>>> but it is worth noting.
>>>>         
>>> well it is of great value: our tables need only be 16% the size they were
>>> before. at the cost of 2-3 times more computational effort.
>>>
>>>       
>> I aimed at the fact that we will not see all possible 64 bit
>> keystreams in practice. So there's some redundancy hidden in there.
>>
>> Looking at the last 64 bits of keystream used to cipher the first
>> 114 bits, one in ten combinations don't exist. That's a weakness of
>>     
>
> you meant 9 in 10!?
>
>   
yes correct!


>> the cipher in itself, as if we would know that bits A B and C are
>> always one. Which - if it were that simple - would allow us to say
>> something about the contents of the burst.
>>     
>
> the strength of the cipher is reduced by a factor of 6, now you have
> a 2^61.5 state cipher, thats still pretty complex.
>
>   
factor 6 to 16, depending on which part of the 2*114 bit keystream you 
look at.

But like I said, I don't see any clear way to generate / detect illegal 
keystreams. Maybe someone here has an idea.

..
>> Anyway. The table now looks like:
>> N    S    cpu cost    table eff    total eff
>> 0    84%    1.00    1.00    1.00
>> 1    75%    1.02    1.56    1.54
>> 2    72%    1.03    1.75    1.70
>> 4    68%    1.06    2.00    1.88
>> 8    63%    1.13    2.31    2.06
>> 16    53%    1.25    2.94    2.35
>> 32    37%    1.47    3.94    2.68
>> 64    12%    1.89    5.50    2.91
>> 96    0%    2.30    6.25    2.72
>>
>> So I would say we should restate our cracking goal from:
>>
>> "a mapping from the internal state of the A5/1 algorithm (64 bits)
>> to the first 64 bits of keystream that get generated from that
>> initial internal state. "
>>
>> to
>>
>> "a mapping from the internal state of the A5/1 algorithm (64 bits)
>> _after the loading of Kc and the frame number_ to the first 64 bits
>> of keystream that get generated from that initial internal state. "
>>
>>     
>
> you mean:
> "a mapping from the internal state of the A5/1 algorithm (64 bits)
> that can occur after 100 initial rounds to the first ..."
> Kc and frame number are clocked in regularly.
> for those interested in details, the footnote says (actually not 100
> but N, where N is 64 or 96 or 100 or sth).
>
>   
No. I mean the state from after Kc and frame number are clocked in.

It's a semantical issue:

- either you state the goal as you did, put the 100 clocking in the 
round function, and call the to-be reversed function the 64-bits 
directly generated by a certain state

- or you state the goal as I did, and put the 100 clocking in the to-be 
reversed function (and keep the round function untouched)


But I realised that the efficiency gain is even bigger. If you find a 
hit in the current tables, you wind back the state 100 times and find on 
average 1 ancestor state. If you'd find a hit in the new tables, you can 
forward 100 times and back 100 times again, and find 13 sibling states 
on average, which would generate the exact same keystream.

Some numbers, based on 100000 samples and clocking F/B or B only 100 times:

#siblstates*    #F/B    #B  (*#siblstates is the number of siblings 
including the original state)
0    0    84664      (there were 0 states which when clocked 
forward/back 100 times have 0 sibling states / there are 84664 that 
can't be clocked back 100 times at all)
1    2766    2830  (there were 2766 states which when clocked 
forward/back 100 times have 1 sibling state / there are 2830 that when 
clocked back 100 times give 1 state)
2    3341    1704  etc etc
3    5165    1776
4    5501    1390
5    5525    1175
6    5487    961
7    5682    845
8    5482    699
9    5065    530
10    4927    511
11    4532    381
12    4229    365
13    3991    331
14    3633    264
15    3341    236
..
66    12    0
67    10    0
68    5    0
69    11    0
70    8    0
71    6    1
72    4    0
73    5    0
74    4    0
75    2    0
76    3    0
77    2    0
78    4    0
79    1    0
80    1    0
82    4    0
84    1    0
86    1    0
87    1    0
89    2    0
90    1    0
91    1    0
92    1    0
93    1    0
94    1    0
118    1    0
WAVG:    13.09    1.00

Note that the 1.00 follows from theory as well. These numbers are pretty 
stable.

So the tables are 13 times more effective. The cpu cost is 2.3 bigger, 
so the figure changes to 5.6 times more efficient. (but we may need more 
colors in our rainbow - see below)
>> i.e. take N=100. For implementation efficiency maybe better take a
>> multiple of 8, so N=92.
>>     
>
> bitslicing doesn't care if its mod 8 or a prime and all of our (soon to
> exist) implementations use bitslicing.
>
>   
It might be wise to keep the implementation friendly to those trying to 
make lookup table versions. But 100 is a multiple of 4 so that might be 
ok as well.

>> When cracking, you should then of course still forward N times and
>> then clockback again to find all sibling states leading to the same
>> cipherstream.
>>
>>     
>
> Ok, i am still a bit confused so i made a checklist and proposal for a 
> consensus:
>
> 1) after loading the previous round keystream into our A5/1 registers,
>    we clock 100 times forward, then generate 64 bits of new keystream.
>    Then the round function is applied and the loop continues.
>
>   
agreed. So the steps are

loop {
 S = Sinitial
 S = S^round
 S = F(S,100)
 K = K(S,64)
 Sinitial = K
}

and for every chain we record the first Sinitial and the last K generated.

> 2) The whole keyspace shrinks to below 16% of 2^64
>
>   
depending on what you would call "key" - the initial state space is 
still 2^64. The 100-round state and thus the generated keystream space 
is 84% smaller, so that's a source of chain mergers if we don't use the 
right round function. All states but the first in a chain are then 
member of a subset defined by the fact that we take 100 rounds and the 
round function. If we don't want to reduce our chances of finding a 
certain keystream at all with 84%, we have to make sure we have a set of 
rounding functions that all together map this subset to the entire 2^64 
spectrum. (So apart from the chain merger issue, we actually *need* at 
least 1/0.16 round functions to keep covering the entire 2^64 state 
space) The answer to the question I posted above, how one could 
determine what the subset looks like, could save us some headaches.

(I had this idea to xor the previous state into the new key, so you 
would almost have no chain mergers. But this would also make lookup 
impossible...)

Maybe we should add some colors as well - intuition says to get at 
32*(1/16%)=200 total at least. (although the source of chain mergers 
until now might mostly be this state convergence as well, so we could 
live with 32 colors) This needs to be empirically determined. I can't do 
those statistics here with no cuda card, especially since you would need 
some 100000 samples to get stable numbers.

> 3)    Therefore we need only compute 16% the amount of tables that we 
> initially thought.
>   
Or 1/13th, since lookups are 13 times as effective.

> 3.5)  It takes 2 to 3 times longer to generate the tables and do lookup 
> precomputation.
>   
true.
> 3.5a) Generation time is halved, storage requirement shrinks to 16%
>   
to cover the complete keystream space, yes. To be as effective as now, 
storage shrinks to 1/13 or 8%.
> 4) There is still a small chance that we find an illegal state in our 
> database, as
>    bits in the burst get encrypted with keystream that was produced not after 
> 100,
>    but 100 + N clockings
>   
Clocking forward 100 times opens up 12 alternative pathways on average, 
so there's ways around this. In 2.7% of cases (see above) you would hit 
on the exact state you came from. In other cases there's a chance to get 
around the illegal state. So this chance is small*small.

With reference to the mail I sent yesterday, this is also a method to 
get even more states which in the end generate the same keystream: clock 
forward 101 times, clock back 1 time into another path while checking if 
the same bit of keystream had been generated. With roughly a 5% chance 
of finding more states leading to the same keystream. (and do the same 
clocking forward 100+N times for N=2..16 to get even more sibling states)

By the way: to verify if a state is in fact the state that led up to the 
keystream, you would only have to generate the rest of the keystream and 
match it against the actual keystream. The same holds for discarding 
streams after 100+N clockings, where you would match bits generated 
before the bits you've looked up. Considering the growth of ancestor 
states (some 5% per clock max) and the information in the keystream (50% 
per clock) you can be sure to have the right Kc even before looking at 
another burst of the conversation.

I didn't want to mess up the discussion too much before, but here's 
another one to consider: how about not clocking 100 forward but, say, 
116. This would reduce our chances of finding a hit within the first 
64+15 bits of keystream. But it also reduces our initial state space 
with 57%, i.e. we could lose one bit of data on every start value. 
That's 2% smaller tables. I don't think it is worth the mess it creates 
when you would actually try and save this one bit. But I at least wanted 
to share the thought, it might inspire other ideas.

> 5) clocking 100 times is not optimal in terms of work per optimization but 
> with a slight
>    overhead the tables are cleaner (i suggest) 
>   
I agree. The end-to-end picture of what we're doing is somewhat easier 
to explain as well.


I would like to repeat my request that anybody would redo these stats 
and confirm this is all ok.

Also I think that it would be wise to put some of these findings and 
statistics in a central place (say, the wiki, to which I have no access) 
for later reference and verification. There we could split it into 3 or 
more parts: statistics, table generation and cracking/reverse clocking.

Regards,
M.

_______________________________________________
A51 mailing list
[email protected]
http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51

Re: [A51] 5.6x efficiency increase by removing illegal states

Reply via email to