Re: Serious vulnerability in RSA and other crypto algorithms

Gabriel Withington Tue, 23 Jan 2018 14:07:41 -0800

I've ported the computationally intensive portion of my algorithm (the
search for two exponents 'a' and 'b' such that m^a % n = m^b % n) to C++
with arbitrary precision. I've attached the file and it includes building
and running instructions at the top. The speed is about comparable to Perl
with normal precision. And right now, the most obvious improvement is
probably to use arbitrary precision integers as the hash keys rather than
the string representation (but alas, this seems non-trivial using Boost
multiprecision and C++ unordered maps).


And I'm just starting to search the semiprime space but so far I've found
an 18 decimal digit semiprime (n = 655359999157953307)  which I was able to
find exponents 'a' and 'b' such that 2^a % n = 2^b % n (a = 1220703122 and
b = 2) in 0.008 seconds while calculating 90 powers of 2. In comparison,
I've performed this search for 165 9-digit semiprimes and the fastest
result was in 0.007 seconds calculating 145 powers of 2.

I know these are puny numbers compared to what is relevant to real world
cryptography but the point is that for some semiprimes, the computational
cost seems to be increasing slowly with key length. I suspect that some
real world keys will be vulnerable and that key length is not a good
indicator of key strength. Which is a very big problem because the only way
to know how secure a key is is to break it (and even then, it may be more
vulnerable to a different search pattern).

I hope this helps.

Gabriel

On Tue, Jan 16, 2018 at 1:10 PM, Gabriel Withington <
[email protected]> wrote:

> There is a serious flaw in cryptography based on semiprimes. While
> attempts at breaking such cryptography typically focus on factoring
> semiprimes, the approach I have identified sidesteps this challenge
> entirely. I have yet to evaluate my techniques performance on keys of
> practical length, a number of features suggest that it should be highly
> efficient.
>
> What I recognized that the goal of decrypting a message does not require
> knowing the prime factors. Cryptography is possible due to an intrinsic
> characteristic of semiprimes that there exists an exponent 'k' such that
> x^k mod n = 1 where 'n' is a semiprime number and 'x' is an arbitrary
> integer with x < n. And a number can be encrypted with the formula x^e mod
> n = y where 'y' is the encrypted message. Then decrypted with the formula
> y^d mod n = x where d+e = k+1. (Really d+e = i*k+1 where i is a positive
> integer.)
>
> The formula x^r mod n (for positive integer 'r') acts as a clock function,
> which includes a predictable periodicity. For powers if i*k, the formula
> passes through one and the next power (i*k+1) gives the original power
> back. And for the public key exponent 'e', a message can be decrypted using
> i*k+1-e, regardless of what the private key exponent is.
>
> All of that is to say that given an encrypted message and the associated
> public key, you just need to find the loop exponent 'k' and the problem
> becomes trivial.
>
> The good news is that searching for k is highly efficient. All that is
> needed is two different exponents which provide the same result. The goal
> is two exponents 'r' and 's' such that x^r mod n = x^s mod n since the
> difference will be a power which produces one in the formula. This can be
> found by testing different exponents and storing them along with the
> results in two hashes. The hash keyed in exponents avoids recalculating
> previous attempts and the hash keyed in results makes finding duplicate
> values as quick as a hash lookup.
>
> This reduces the difficulty to a birthday problem. (If you have 'n' people
> in a room, what are the odds two or more have the same birthday.
> https://en.wikipedia.org/wiki/Birthday_problem) The tested exponents can
> be spread out to limit repetition of similar distances, this means that
> each new exponent is compared against all previous ones with a portion of
> the comparisons being unique. And the unique portion will be a fraction of
> the total number of previous attempts. Which means that the searched space
> increases geometrically in both time and memory space.
>
> And that's the type of thing you want when you're trying to break
> cryptography.
>
> (If you consider the space of semiprime 'n' to be what is searched, this
> approach is actually quite a bit better than the birthday problem. You need
> 367 people to guarantee that two have the same birthday but you don't need
> to try all 'n' possible exponents but all possible distances between
> exponents which can be contained in 'n'. I'm an applied mathematician and
> never really liked number theory so I'm just going to guess and say you
> need about sqrt n tries to cover the whole space. So far, outstanding.)
>
> The other piece of really good news is that the value which is being
> searched for isn't unique. Any exponent which gives unity is sufficient for
> decryption and there are guaranteed to be at least two which are less than
> n. And the number of potential values increases with increasing key length
> (not absolutely but on average). Which means that longer keys actually give
> more chances to find a suitable exponent and the odds increase with each
> exponent tried.
>
> There's some obvious detail I've added but I feel like the takeaway is
> that simply looking for the loop exponent 'k' rather than trying to factor
> semiprimes makes the challenge of breaking RSA and related methods far
> easier. In fact, I believe that this method will allow for the cracking of
> keys with a reasonable length in a reasonable time on existing hardware.
> (My guess is that this approach may break keys in sub-linear time relative
> to key length.)
>
> Now for the bad news. The exponent identified in the search I described
> above is not the same as the 'k' I've been discussing. Each value 'x' which
> is encrypted has it's own loop exponent which can be a factor (therefore
> smaller) than the 'k' specific to the controlling semiprime. If an exponent
> which gives a result of unity for a value 'x', that exponent can be used to
> decrypt that message. But it is likely that a different exponent will be
> needed for a different message 'x' (really, message chunk since the total
> message is broken up into pieces for encryption).
>
> In order to address this, it seems wise to begin searching with an x of 2
> for greater computational efficiency. Once a working exponent is found it
> should be factored to find the smallest exponent for that message. Then, if
> actual encrypted data is available, it can be searched but only for
> exponents which are multiples of the one identified previously. And each
> time a new exponent is found, the process can be repeated to reduce the
> search time for remaining message chunks. (I know this sounds like a lot of
> work but the space searched is rapidly contracting and I'm still hoping for
> sublinear.)
>
> That's a bit of a drag but the good news about the bad news is that
> multiple message chunks can be searched in parallel without any memory
> sharing between them. So I'm guessing it should be stupid simple to
> distribute computation (for people who actually understand these things,
> which I do not).
>
> And that's it. I first stumbled across this four years ago and took it far
> enough to convince myself it works, then stopped and didn't tell anybody
> about it until last week. Now a bunch of people know, if they can recognize
> a good thing when they see it, but nobody is talking about it. Which is a
> little scary. But people should probably switch to elliptic curve based
> public key cryptography (with the strong caveat that those approaches may
> be broken by design).
>
> Feeling less secure yet?
>
> Thanks,
> Gabriel
>

// Compile with: $ g++ -std=c++11 rsa.cc -o rsa
// Call as: $ ./rsa SEMIPRIME MESSAGE SCALE
// MESSAGE and SCALE will default to 2 and 1 respectively

#include <iostream>
#include <unordered_map>
#include <boost/multiprecision/cpp_int.hpp>

using namespace std;
using big_int = boost::multiprecision::cpp_int;

// key in big_int rather than string for, probably, large performance boost
std::unordered_map<string, big_int> exph; // hash of results keyed in exponents
std::unordered_map<string, big_int> resh; // hash of exponents keyed in results
big_int dups[2] = {0, 0}; // exponents a and b giving the same result m^a % n = m^b % n

void exEucAlg (big_int a, big_int b, big_int *ret);
void hupdate(big_int pow, big_int res);
big_int powh(big_int val, big_int pow, big_int mod);
void looper(big_int N, big_int mess, int scale);

int main (int argc, char* argv[]) {
	big_int n, m = 2;
	int scale = 1;

	if (argc < 2) {
		cout << "Call with a semiprime\n\n";

		return 1;
	}

	n = big_int(argv[1]);

	if (argc >= 3) {
		m = big_int(argv[2]);

		if (argc >= 4) {
			scale = atoi(argv[3]);
		}
	}

	looper(n, m, scale);

	return 0;
}

void looper(big_int N, big_int mess, int scale) {
	big_int M = powm(mess, scale, N); // mess ** scale % N
	big_int top = N / scale; // scaling
	big_int bot = sqrt(top); // don't look for exponents less than this, fairly arbitrary
	big_int test; // stores test exponent
	int i = 3, I = 100; // counter and max for exponent stepping (by factors of (i-1)/i)

	// clear global variables for clean run
	exph.clear();
	resh.clear();
	dups[0] = 0; dups[1] = 0;

	if(M == 1) { // scale exponent satisfies mess^scale % N = 1, done
		dups[0] = scale;

		cout << "Trivial case: " << mess << " ** " << scale << " % " << N << " = 1\n\n";
		return;
	}

	powh(M, top, N); // largest exponent we'll touch

	while (i <= I and ! dups[0]) {
//		bot = i; // setting bot = i avoids rounding error infinite loop, more thorough than bot = sqrt(top)
		test = (i-1)*top/i;

		// step through test exponents, slightly smaller steps each time
		while (test > bot && ! dups[1]) {
			powh(M, test, N);
			test = (i-1)*test/i;
		}

		i++;
	}

	if(dups[1]) {
		big_int res = exph[dups[0].str()];
		dups[0] = dups[0]*scale;
		dups[1] = dups[1]*scale;

		cout << "\nSuccess " << mess << " ** " << dups[0] << " % " << N << " = " << mess << " ** " << dups[1] << " % " << N << " = " << res << "\n\n";
	} else {
		cout << "\nA miserable failure...\n\n";
	}

	cout << "Calculated " << exph.size() << " powers\n\n";

	return;
}

// hashed power function
big_int powh(big_int val, big_int exp, big_int mod) {
	if (exph.find(exp.str()) != exph.end()) {
		return exph[exp.str()];
	}

	if (exp == 1) {
		hupdate(exp, val);
		return val;
	}

	if (exp % 2) {
		hupdate(exp, (powh(val, exp-1, mod) * val) % mod);
		return exph[exp.str()];
	}

	hupdate(exp, pow(powh(val, exp/2, mod), 2) % mod);
	return exph[exp.str()];
}

// update global variables for hashed power function
void hupdate(big_int pow, big_int res) {

	if (resh.find(res.str()) != exph.end()) {
		dups[0] = pow;
		dups[1] = resh[res.str()];
	}

	exph[pow.str()] = res;
	resh[res.str()] = pow;

	return;
}

/* Extended Euclidean Algorithm from https://en.wikibooks.org/wiki/Algorithm_Implementation/Mathematics/Extended_Euclidean_algorithm */
void exEucAlg (big_int a, big_int b, big_int *ret) {
	big_int aa[2] = {1, 0};
	big_int bb[2] = {0, 1};
	big_int q, r;

	while(1) {
		q = a / b;
		a = a % b;

		aa[0] = aa[0] - q*aa[1];
		bb[0] = bb[0] - q*bb[1];

		if (a == 0) {
			ret[0] = b;
			ret[1] = aa[1];
			ret[2] = bb[1];

			return;
		}

		q = b / a;
		b = b % a;

		aa[1] = aa[1] - q*aa[0];
		bb[1] = bb[1] - q*bb[0];

		if (b == 0) {
			ret[0] = a;
			ret[1] = aa[0];
			ret[2] = bb[0];

			return;
		}
	}
}

Re: Serious vulnerability in RSA and other crypto algorithms

Reply via email to