Re: [cryptography] obfuscating symbols without increasing their size

Tim Dierks Wed, 19 Jan 2011 19:22:45 -0800

On Wed, Jan 19, 2011 at 9:31 PM,
<[email protected]<travis%[email protected]>
> wrote:

> On Thu, Jan 20, 2011 at 12:49:26PM +1100, Noon Silk wrote:
> > Sounds to me like the simplist solution is just a one-time pad[1]. It
> > won't increase the size, and from the sounds of your environment, you
> > can just keep the keys locally, and use them only when you do the
> > debugging. But perhaps I'm misunderstanding your question.
>
> Yeah, I knew people would tend not to answer my question and simply
> provide solutions which won't work due to the context.  But possibly
> someone will give me something I haven't thought of, so let me explain
> further.
>
> OTP won't work - simply XORing a printable character with a non-printable
> won't guarantee a printable, for example, and symbols have to be printable.
>
> It'd be much simpler to just map symbols to ordinal values, but that
> has the following problem:
>
> The releases may have different sets of symbols, in different orders.
>
> Furthermore, the symbols have to map to the same thing on subsequent
> releases so crashes can be correlated across releases.
>
> Finally, it's a burden for data to have to be propogated from the
> obfuscation run on one release to the next.  They might be done by
> different groups who don't normally communicate, for example, or there
> could be release of different branches so no strict ordering in place
> (apart from temporal, and that's kinda hard to enforce; one person
> fails to update the obfuscated symbol mapping, or whatever shared
> state is supposed to be passed along, and everything's hosed).
>

Do the following:
 1. You will need to make a choice of either leaking identifier length, or
having a maximum identifier length. This is because an encryption algorithm
is a reversible mapping, and needs to operate over a finite set of elements.
You can either have a distinct set for each identifier length or have a
single set which has a finite number of possible elements thanks to the raw
token length being constrained. If there are resource constraints (e.g.
tokens can be a significant fraction library size), be prepared for the
likelihood that if you take the second path (not leaking token size), that
all of your encrypted tokens will be near the maximum size (so, if you pick
100-character max size, so you won't constrain your programmers, expect that
encrypted tokens will all be close to 100 characters long).

 2. Develop a one-to-one mapping between symbols expressible in your grammar
and numeric values with a finite scope. For example, let's say symbols are
C++-legal tokens, which can begin with a letter or _ (case-sensitive, 53
alternatives) and then each following character can be a letter, a digit, or
an underscore (63 alternatives). In that case, you could derive a value by
mapping each character to a value 0-63 (where the digit characters are
mapped to values 54-63) and construct a numeric value for the token as
follows:
 * c_i is the numeric value for character i of the token (where c_0 is the
first character)
 * the value of the token = c_0 + c_1 * 53 + c_2 * (53 * 63) + c_3 * (53 *
63^2) + ...  + c_n * (53 * 63^(n-1))
 * (This is the same as constructing a number which is base 63, except for
being base 53 in the least-significant digit).
For more complex grammars, it can be more challenging to construct the
mapping, but it should be possible.

 3. You now have a bidirectional mapping between legal tokens and values
0..MAX, where MAX is the token with the highest possible value (c_0..c_n are
each selected to be 53 or 63, as appropriate, in our example).

 4. You can now construct an encryption algorithm which encrypts within this
set. Alternatives include:
  a. Express the value as a bitstring of n bits long (where 2^(n-1) <= MAX <
2^n), and construct an n-bit block cipher using a balanced or unbalanced
Feistel construction, where you can use a strong block cipher (e.g. AES) as
the f() function and require only a few rounds (I don't recall the minimum
for strong security, 3 or 4). If the encrypted value is greater than MAX,
encrypt again until it is not; this will require, on average, less than 2
applications of the block cipher before you get a value <= MAX. Then,
reverse the token encoding to get a random-looking token that can be
decrypted by reversing the procedure.
  b. Express the value as a bitstring and use a stream cipher. Similarly, if
the encrypted value ends up overflowing MAX, reencrypt until it doesn't.
You'll need to rekey the cipher for each token, so create a salt from the
original token, perhaps by hashing it with SHA-1 or similar, and take a
certain number of the bits (suitable to minimize the possibility of salt
collision across the number of tokens to be encrypted with a single key).
Map the salt to token-legal grammar and emit encoded tokens by prepending
the salt to your (token-encoded) encrypted result (you may need something
somewhat more complex for certain token grammars).

I hope that's intelligible. Good luck!
 - Tim

_______________________________________________
cryptography mailing list
[email protected]
http://lists.randombit.net/mailman/listinfo/cryptography

Re: [cryptography] obfuscating symbols without increasing their size

Reply via email to