On 1/11/10 5:09 AM, sascha wrote:
> On Mon, Jan 11, 2010 at 12:11:53AM +0100, M vd S wrote:
>
>> On 1/10/10 8:41 PM, sascha wrote:
>>
>>> On Sun, Jan 10, 2010 at 05:15:46PM +0100, M vd S wrote:
>>>
>>>> Apart from the optimization, we can also observe that the state
>>>> space converges to 2^64 * (1-84%) before generating the keystream.
>>>> This must also mean that only 16% of 2^64 possible 64 bit keystream
>>>> can exist. I wouldn't know how this can be of value to us right now,
>>>> but it is worth noting.
>>>>
>>> well it is of great value: our tables need only be 16% the size they were
>>> before. at the cost of 2-3 times more computational effort.
>>>
>>>
>> I aimed at the fact that we will not see all possible 64 bit
>> keystreams in practice. So there's some redundancy hidden in there.
>>
>> Looking at the last 64 bits of keystream used to cipher the first
>> 114 bits, one in ten combinations don't exist. That's a weakness of
>>
>
> you meant 9 in 10!?
>
>
yes correct!
>> the cipher in itself, as if we would know that bits A B and C are
>> always one. Which - if it were that simple - would allow us to say
>> something about the contents of the burst.
>>
>
> the strength of the cipher is reduced by a factor of 6, now you have
> a 2^61.5 state cipher, thats still pretty complex.
>
>
factor 6 to 16, depending on which part of the 2*114 bit keystream you
look at.
But like I said, I don't see any clear way to generate / detect illegal
keystreams. Maybe someone here has an idea.
..
>> Anyway. The table now looks like:
>> N S cpu cost table eff total eff
>> 0 84% 1.00 1.00 1.00
>> 1 75% 1.02 1.56 1.54
>> 2 72% 1.03 1.75 1.70
>> 4 68% 1.06 2.00 1.88
>> 8 63% 1.13 2.31 2.06
>> 16 53% 1.25 2.94 2.35
>> 32 37% 1.47 3.94 2.68
>> 64 12% 1.89 5.50 2.91
>> 96 0% 2.30 6.25 2.72
>>
>> So I would say we should restate our cracking goal from:
>>
>> "a mapping from the internal state of the A5/1 algorithm (64 bits)
>> to the first 64 bits of keystream that get generated from that
>> initial internal state. "
>>
>> to
>>
>> "a mapping from the internal state of the A5/1 algorithm (64 bits)
>> _after the loading of Kc and the frame number_ to the first 64 bits
>> of keystream that get generated from that initial internal state. "
>>
>>
>
> you mean:
> "a mapping from the internal state of the A5/1 algorithm (64 bits)
> that can occur after 100 initial rounds to the first ..."
> Kc and frame number are clocked in regularly.
> for those interested in details, the footnote says (actually not 100
> but N, where N is 64 or 96 or 100 or sth).
>
>
No. I mean the state from after Kc and frame number are clocked in.
It's a semantical issue:
- either you state the goal as you did, put the 100 clocking in the
round function, and call the to-be reversed function the 64-bits
directly generated by a certain state
- or you state the goal as I did, and put the 100 clocking in the to-be
reversed function (and keep the round function untouched)
But I realised that the efficiency gain is even bigger. If you find a
hit in the current tables, you wind back the state 100 times and find on
average 1 ancestor state. If you'd find a hit in the new tables, you can
forward 100 times and back 100 times again, and find 13 sibling states
on average, which would generate the exact same keystream.
Some numbers, based on 100000 samples and clocking F/B or B only 100 times:
#siblstates* #F/B #B (*#siblstates is the number of siblings
including the original state)
0 0 84664 (there were 0 states which when clocked
forward/back 100 times have 0 sibling states / there are 84664 that
can't be clocked back 100 times at all)
1 2766 2830 (there were 2766 states which when clocked
forward/back 100 times have 1 sibling state / there are 2830 that when
clocked back 100 times give 1 state)
2 3341 1704 etc etc
3 5165 1776
4 5501 1390
5 5525 1175
6 5487 961
7 5682 845
8 5482 699
9 5065 530
10 4927 511
11 4532 381
12 4229 365
13 3991 331
14 3633 264
15 3341 236
..
66 12 0
67 10 0
68 5 0
69 11 0
70 8 0
71 6 1
72 4 0
73 5 0
74 4 0
75 2 0
76 3 0
77 2 0
78 4 0
79 1 0
80 1 0
82 4 0
84 1 0
86 1 0
87 1 0
89 2 0
90 1 0
91 1 0
92 1 0
93 1 0
94 1 0
118 1 0
WAVG: 13.09 1.00
Note that the 1.00 follows from theory as well. These numbers are pretty
stable.
So the tables are 13 times more effective. The cpu cost is 2.3 bigger,
so the figure changes to 5.6 times more efficient. (but we may need more
colors in our rainbow - see below)
>> i.e. take N=100. For implementation efficiency maybe better take a
>> multiple of 8, so N=92.
>>
>
> bitslicing doesn't care if its mod 8 or a prime and all of our (soon to
> exist) implementations use bitslicing.
>
>
It might be wise to keep the implementation friendly to those trying to
make lookup table versions. But 100 is a multiple of 4 so that might be
ok as well.
>> When cracking, you should then of course still forward N times and
>> then clockback again to find all sibling states leading to the same
>> cipherstream.
>>
>>
>
> Ok, i am still a bit confused so i made a checklist and proposal for a
> consensus:
>
> 1) after loading the previous round keystream into our A5/1 registers,
> we clock 100 times forward, then generate 64 bits of new keystream.
> Then the round function is applied and the loop continues.
>
>
agreed. So the steps are
loop {
S = Sinitial
S = S^round
S = F(S,100)
K = K(S,64)
Sinitial = K
}
and for every chain we record the first Sinitial and the last K generated.
> 2) The whole keyspace shrinks to below 16% of 2^64
>
>
depending on what you would call "key" - the initial state space is
still 2^64. The 100-round state and thus the generated keystream space
is 84% smaller, so that's a source of chain mergers if we don't use the
right round function. All states but the first in a chain are then
member of a subset defined by the fact that we take 100 rounds and the
round function. If we don't want to reduce our chances of finding a
certain keystream at all with 84%, we have to make sure we have a set of
rounding functions that all together map this subset to the entire 2^64
spectrum. (So apart from the chain merger issue, we actually *need* at
least 1/0.16 round functions to keep covering the entire 2^64 state
space) The answer to the question I posted above, how one could
determine what the subset looks like, could save us some headaches.
(I had this idea to xor the previous state into the new key, so you
would almost have no chain mergers. But this would also make lookup
impossible...)
Maybe we should add some colors as well - intuition says to get at
32*(1/16%)=200 total at least. (although the source of chain mergers
until now might mostly be this state convergence as well, so we could
live with 32 colors) This needs to be empirically determined. I can't do
those statistics here with no cuda card, especially since you would need
some 100000 samples to get stable numbers.
> 3) Therefore we need only compute 16% the amount of tables that we
> initially thought.
>
Or 1/13th, since lookups are 13 times as effective.
> 3.5) It takes 2 to 3 times longer to generate the tables and do lookup
> precomputation.
>
true.
> 3.5a) Generation time is halved, storage requirement shrinks to 16%
>
to cover the complete keystream space, yes. To be as effective as now,
storage shrinks to 1/13 or 8%.
> 4) There is still a small chance that we find an illegal state in our
> database, as
> bits in the burst get encrypted with keystream that was produced not after
> 100,
> but 100 + N clockings
>
Clocking forward 100 times opens up 12 alternative pathways on average,
so there's ways around this. In 2.7% of cases (see above) you would hit
on the exact state you came from. In other cases there's a chance to get
around the illegal state. So this chance is small*small.
With reference to the mail I sent yesterday, this is also a method to
get even more states which in the end generate the same keystream: clock
forward 101 times, clock back 1 time into another path while checking if
the same bit of keystream had been generated. With roughly a 5% chance
of finding more states leading to the same keystream. (and do the same
clocking forward 100+N times for N=2..16 to get even more sibling states)
By the way: to verify if a state is in fact the state that led up to the
keystream, you would only have to generate the rest of the keystream and
match it against the actual keystream. The same holds for discarding
streams after 100+N clockings, where you would match bits generated
before the bits you've looked up. Considering the growth of ancestor
states (some 5% per clock max) and the information in the keystream (50%
per clock) you can be sure to have the right Kc even before looking at
another burst of the conversation.
I didn't want to mess up the discussion too much before, but here's
another one to consider: how about not clocking 100 forward but, say,
116. This would reduce our chances of finding a hit within the first
64+15 bits of keystream. But it also reduces our initial state space
with 57%, i.e. we could lose one bit of data on every start value.
That's 2% smaller tables. I don't think it is worth the mess it creates
when you would actually try and save this one bit. But I at least wanted
to share the thought, it might inspire other ideas.
> 5) clocking 100 times is not optimal in terms of work per optimization but
> with a slight
> overhead the tables are cleaner (i suggest)
>
I agree. The end-to-end picture of what we're doing is somewhat easier
to explain as well.
I would like to repeat my request that anybody would redo these stats
and confirm this is all ok.
Also I think that it would be wise to put some of these findings and
statistics in a central place (say, the wiki, to which I have no access)
for later reference and verification. There we could split it into 3 or
more parts: statistics, table generation and cracking/reverse clocking.
Regards,
M.
_______________________________________________
A51 mailing list
[email protected]
http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51