From: bodr...@mail.dm.unipi.it Date: Fri, 4 Jan 2013 02:57:50 +0100 (CET) > Il Ven, 4 Gennaio 2013 1:49 am, David Miller ha scritto: >> Just FYI, I'm also working on an mpn_mul_basecase that makes use of >> the T4 'mpmul' instruction which can do NxN 64-bit limb multiplies >> for values of N from 1 to 32. > > Great! Maybe it can be useful also for mul_2 or higher.
Indeed. One of the things I need to work on is determining where the cut-off is for when 'mpmul' is actually faster than the usual mulx/umulxhi implementation. >> It's an instruction that seems like it was designed specifically for >> libgmp :-) > > If it support only balanced multiplication (NxN and not NxM), its target > probably is 2048-bit public-key crypto. The chip has seperate montgomery multiply and montgomery squaring instructions for public-key crypto, and they are already in use in the openssl tree for example. Yes, the mpmul instruction is limited to balanced NxN multiplies. Well, actually, we could use this mpmul instruction for NxM cases by padding the unused parameters with zeros. That way we could support any case where N <= 32 and M <= 32. > Should we add a balanced only mul_basecase_n function, to be used by > mul_n, to fully exploit such an instruction? Modular arithmetic (crypto, > ECM, etc.) can benefit of such an approach. How much faster than a > fully-flexible mul_basecase would it be? Making this for crypto would be of no value for T4, because as mentioned the chip has other instructions that more directly support modular arithmetic in the form of 'montmul' and 'montsqr' instructions. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel