On Wed, Jan 19, 2011 at 9:31 PM, <[email protected]<travis%[email protected]> > wrote:
> On Thu, Jan 20, 2011 at 12:49:26PM +1100, Noon Silk wrote: > > Sounds to me like the simplist solution is just a one-time pad[1]. It > > won't increase the size, and from the sounds of your environment, you > > can just keep the keys locally, and use them only when you do the > > debugging. But perhaps I'm misunderstanding your question. > > Yeah, I knew people would tend not to answer my question and simply > provide solutions which won't work due to the context. But possibly > someone will give me something I haven't thought of, so let me explain > further. > > OTP won't work - simply XORing a printable character with a non-printable > won't guarantee a printable, for example, and symbols have to be printable. > > It'd be much simpler to just map symbols to ordinal values, but that > has the following problem: > > The releases may have different sets of symbols, in different orders. > > Furthermore, the symbols have to map to the same thing on subsequent > releases so crashes can be correlated across releases. > > Finally, it's a burden for data to have to be propogated from the > obfuscation run on one release to the next. They might be done by > different groups who don't normally communicate, for example, or there > could be release of different branches so no strict ordering in place > (apart from temporal, and that's kinda hard to enforce; one person > fails to update the obfuscated symbol mapping, or whatever shared > state is supposed to be passed along, and everything's hosed). > Do the following: 1. You will need to make a choice of either leaking identifier length, or having a maximum identifier length. This is because an encryption algorithm is a reversible mapping, and needs to operate over a finite set of elements. You can either have a distinct set for each identifier length or have a single set which has a finite number of possible elements thanks to the raw token length being constrained. If there are resource constraints (e.g. tokens can be a significant fraction library size), be prepared for the likelihood that if you take the second path (not leaking token size), that all of your encrypted tokens will be near the maximum size (so, if you pick 100-character max size, so you won't constrain your programmers, expect that encrypted tokens will all be close to 100 characters long). 2. Develop a one-to-one mapping between symbols expressible in your grammar and numeric values with a finite scope. For example, let's say symbols are C++-legal tokens, which can begin with a letter or _ (case-sensitive, 53 alternatives) and then each following character can be a letter, a digit, or an underscore (63 alternatives). In that case, you could derive a value by mapping each character to a value 0-63 (where the digit characters are mapped to values 54-63) and construct a numeric value for the token as follows: * c_i is the numeric value for character i of the token (where c_0 is the first character) * the value of the token = c_0 + c_1 * 53 + c_2 * (53 * 63) + c_3 * (53 * 63^2) + ... + c_n * (53 * 63^(n-1)) * (This is the same as constructing a number which is base 63, except for being base 53 in the least-significant digit). For more complex grammars, it can be more challenging to construct the mapping, but it should be possible. 3. You now have a bidirectional mapping between legal tokens and values 0..MAX, where MAX is the token with the highest possible value (c_0..c_n are each selected to be 53 or 63, as appropriate, in our example). 4. You can now construct an encryption algorithm which encrypts within this set. Alternatives include: a. Express the value as a bitstring of n bits long (where 2^(n-1) <= MAX < 2^n), and construct an n-bit block cipher using a balanced or unbalanced Feistel construction, where you can use a strong block cipher (e.g. AES) as the f() function and require only a few rounds (I don't recall the minimum for strong security, 3 or 4). If the encrypted value is greater than MAX, encrypt again until it is not; this will require, on average, less than 2 applications of the block cipher before you get a value <= MAX. Then, reverse the token encoding to get a random-looking token that can be decrypted by reversing the procedure. b. Express the value as a bitstring and use a stream cipher. Similarly, if the encrypted value ends up overflowing MAX, reencrypt until it doesn't. You'll need to rekey the cipher for each token, so create a salt from the original token, perhaps by hashing it with SHA-1 or similar, and take a certain number of the bits (suitable to minimize the possibility of salt collision across the number of tokens to be encrypted with a single key). Map the salt to token-legal grammar and emit encoded tokens by prepending the salt to your (token-encoded) encrypted result (you may need something somewhat more complex for certain token grammars). I hope that's intelligible. Good luck! - Tim
_______________________________________________ cryptography mailing list [email protected] http://lists.randombit.net/mailman/listinfo/cryptography
