Re: [A51] 2.5x efficiency increase by removing illegal states

sascha Sun, 10 Jan 2010 20:09:50 -0800

On Mon, Jan 11, 2010 at 12:11:53AM +0100, M vd S wrote:
> On 1/10/10 8:41 PM, sascha wrote:
> >On Sun, Jan 10, 2010 at 05:15:46PM +0100, M vd S wrote:
> >>Apart from the optimization, we can also observe that the state
> >>space converges to 2^64 * (1-84%) before generating the keystream.
> >>This must also mean that only 16% of 2^64 possible 64 bit keystream
> >>can exist. I wouldn't know how this can be of value to us right now,
> >>but it is worth noting.
> >
> >well it is of great value: our tables need only be 16% the size they were
> >before. at the cost of 2-3 times more computational effort.
> >
> I aimed at the fact that we will not see all possible 64 bit
> keystreams in practice. So there's some redundancy hidden in there.
> 
> Looking at the last 64 bits of keystream used to cipher the first
> 114 bits, one in ten combinations don't exist. That's a weakness of


you meant 9 in 10!?

> the cipher in itself, as if we would know that bits A B and C are
> always one. Which - if it were that simple - would allow us to say
> something about the contents of the burst.

the strength of the cipher is reduced by a factor of 6, now you have
a 2^61.5 state cipher, thats still pretty complex.

> >the extra time needed is
> >N    factor
> >32   1.47
> >64   1.89
> >96   2.3
> >not xoring, not storing in the extra rounds.
> >total execution time of a 8192 operations kernel taken as basis.
> 
> Thanks. I was thinking that other more powerful optimizations are
> possible when not saving cipherstream. You could take a lookup table
> to determine the number of shifts per lfsr, and shift per bunch of
> places instead of the one-by-one shift you would normally do to get
> the xorring right. I.e. the exact order of shifting doesn't matter
> anymore.


due to the bitslicing none of those optimizations are applicable.
I was expecting a smaller penalty also, due to the fact that i could
avoid DRAM access. The big bottleneck is still the shifting, you have
to do it in a loop touching 64 registers and you have to do it bit by
bit, because you have 64 states mixed with different shift counts and
you have to treat them all equally.

> 
> Anyway. The table now looks like:
> N    S    cpu cost    table eff    total eff
> 0    84%    1.00    1.00    1.00
> 1    75%    1.02    1.56    1.54
> 2    72%    1.03    1.75    1.70
> 4    68%    1.06    2.00    1.88
> 8    63%    1.13    2.31    2.06
> 16    53%    1.25    2.94    2.35
> 32    37%    1.47    3.94    2.68
> 64    12%    1.89    5.50    2.91
> 96    0%    2.30    6.25    2.72
> 
> So I would say we should restate our cracking goal from:
> 
> "a mapping from the internal state of the A5/1 algorithm (64 bits)
> to the first 64 bits of keystream that get generated from that
> initial internal state. "
> 
> to
> 
> "a mapping from the internal state of the A5/1 algorithm (64 bits)
> _after the loading of Kc and the frame number_ to the first 64 bits
> of keystream that get generated from that initial internal state. "
> 

you mean:
"a mapping from the internal state of the A5/1 algorithm (64 bits)
that can occur after 100 initial rounds to the first ..."
Kc and frame number are clocked in regularly.
for those interested in details, the footnote says (actually not 100
but N, where N is 64 or 96 or 100 or sth).

> i.e. take N=100. For implementation efficiency maybe better take a
> multiple of 8, so N=92.

bitslicing doesn't care if its mod 8 or a prime and all of our (soon to
exist) implementations use bitslicing.

> 
> When cracking, you should then of course still forward N times and
> then clockback again to find all sibling states leading to the same
> cipherstream.
> 

Ok, i am still a bit confused so i made a checklist and proposal for a 
consensus:

1) after loading the previous round keystream into our A5/1 registers,
   we clock 100 times forward, then generate 64 bits of new keystream.
   Then the round function is applied and the loop continues.

2) The whole keyspace shrinks to below 16% of 2^64

3)    Therefore we need only compute 16% the amount of tables that we initially 
thought.
3.5)  It takes 2 to 3 times longer to generate the tables and do lookup 
precomputation.
3.5a) Generation time is halved, storage requirement shrinks to 16%

4) There is still a small chance that we find an illegal state in our database, 
as
   bits in the burst get encrypted with keystream that was produced not after 
100,
   but 100 + N clockings

5) clocking 100 times is not optimal in terms of work per optimization but with 
a slight
   overhead the tables are cleaner (i suggest) 
_______________________________________________
A51 mailing list
[email protected]
http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51

Re: [A51] 2.5x efficiency increase by removing illegal states

Reply via email to