On Saturday 31 December 2011 20:55:47 Bill Hart wrote: > Would it be possible to set a flag HAVE_MPN_SUMDIFF_N in mpir.h if > this function is available and similarly for addsub? That way code > like mine could use these functions when they are available by > conditionally including alternatives in the cases where these defines > don't exist. >
The difficulty is the HAVE_NATIVE defines are in config.h with all the HAVE_SYSTIME VERSION PACKAGE defines , if we just catted config.h onto mpir.h then we could easily break some other library built ontop of mpir , for development it would be OK though , the other option is to separate the HAVE_NATVE defines out , but it not clear how to do that in the mess that is configure > Obviously this won't be a problem when the code goes into mpir, but > this sort of thing is a strong disincentive to writing fast mpn code > as it is much easier to do as standalone than as part of mpir, > initially. Yeah I prefer to develop outside mpir > > Bill. > > On 31 December 2011 18:48, Bill Hart <[email protected]> wrote: > > For the FFT the aliasing is not so important, as the butterfly can be > > done into temporary space then swapped. I use this strategy a lot and > > it doesn't affect the FFT timings. > > > > Bill > > > > On 31 December 2011 18:22, Jason <[email protected]> wrote: > >> On Saturday 31 December 2011 17:47:03 Jason wrote: > >>> On Tuesday 27 December 2011 17:27:48 Bill Hart wrote: > >>> > In my FFT I make use of mpn_sumdiff_n and mpn_addsub_n. It seems these > >>> > are not exported even though there are generic C versions. > >>> > > >>> > Also, I see there is no sumdiff_n.as on core2 style machines. Is it > >>> > possible to include mpn_sumdiff_n.c in the library on such machines so > >>> > that it is included unconditionally for all machines? > >>> > > >>> > >>> we would still have the other arches to do ie power,arm etc ,to make it > >>> unconditional addsub needs to allocate some tmp space ,I suppose we could > >>> split the addsub it to various overlap cases this may be possible , but > >>> for sumdiff I dont think it is > >>> > >> > >> addsub is possible , and so is addadd although the case addadd_n(t,x,y,z) > >> where t=x=y=z requires mul_1(t,x,3) which on core2 and sandybridge the > >> same speed as two adds , dont know about the other arch ,although if we > >> consider this a rare case then it may not be important. sumdiff the only > >> difficult case is when the sum and difference are aliased with the bot the > >> sources , we could exclude this overlap condition? , it would also relax > >> the instruction ordering which would ease up finding faster asm versions > >> > >> > >>> > Is there a reason to not have an assembly optimised version for core2? > >>> > > >>> > >>> I havent found one for core2 or sandybridge which is faster than a > >>> separate add and sub > >>> > >>> > Bill. > >>> > > >>> > > >>> > >>> > >> > >> -- > >> You received this message because you are subscribed to the Google Groups > >> "mpir-devel" group. > >> To post to this group, send email to [email protected]. > >> To unsubscribe from this group, send email to > >> [email protected]. > >> For more options, visit this group at > >> http://groups.google.com/group/mpir-devel?hl=en. > >> > > -- You received this message because you are subscribed to the Google Groups "mpir-devel" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/mpir-devel?hl=en.
