Ciao, Il Gio, 23 Marzo 2017 8:46 pm, Adrien Prost-Boucle ha scritto: >> About the pure C code, integer version that was working on,
>> But... when I put that code in GMP code, that resulted in >> a noticeable slowdown /o\ > Problem solved. > Branch prediction made GMP's sqrtrem1 appear faster than it actually > is on normal pseudo-random workloads. > So, I have a working and exhaustively tested C version for sqrtrem1, > that is slightly faster than GMP's. > Patch coming soon. I'll be happy to examine it. Your observations ask for some investigation... Is your version faster because of a faster core-sequence or thanks to an improved handling of the possible branches? Can the proposed branch structure be applied also to the current code, or it's strictly linked to the new core? We shall probably move to SQRTREM1_NEEDNORM=0 (notation from Adrien's patch) as soon as sqrtrem2 do not need rem1 any more, to avoid, at least, the double computation of the reminder for non-normalised inputs. > Do I do it based on rev 17327 or on my previous patch that uses > FP instructions on x86-86? The two patches are orthogonal, I suggest not to mix them. Regards, m -- http://bodrato.it/papers/ _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel