On Saturday 31 December 2011 20:55:47 Bill Hart wrote:
> Would it be possible to set a flag HAVE_MPN_SUMDIFF_N in mpir.h if
> this function is available and similarly for addsub? That way code
> like mine could use these functions when they are available by
> conditionally including alternatives in the cases where these defines
> don't exist.
> 

The difficulty is the HAVE_NATIVE defines are in config.h with all the 
HAVE_SYSTIME VERSION PACKAGE defines , if we just catted config.h onto mpir.h 
then we could easily break some other library built ontop of mpir ,  for 
development it would be OK though , the other option is to separate the 
HAVE_NATVE defines out , but it not clear how to do that in the mess that is 
configure

> Obviously this won't be a problem when the code goes into mpir, but
> this sort of thing is a strong disincentive to writing fast mpn code
> as it is much easier to do as standalone than as part of mpir,
> initially.

Yeah I prefer to develop outside mpir

> 
> Bill.
> 
> On 31 December 2011 18:48, Bill Hart <[email protected]> wrote:
> > For the FFT the aliasing is not so important, as the butterfly can be
> > done into temporary space then swapped. I use this strategy a lot and
> > it doesn't affect the FFT timings.
> >
> > Bill
> >
> > On 31 December 2011 18:22, Jason <[email protected]> wrote:
> >> On Saturday 31 December 2011 17:47:03 Jason wrote:
> >>> On Tuesday 27 December 2011 17:27:48 Bill Hart wrote:
> >>> > In my FFT I make use of mpn_sumdiff_n and mpn_addsub_n. It seems these
> >>> > are not exported even though there are generic C versions.
> >>> >
> >>> > Also, I see there is no sumdiff_n.as on core2 style machines. Is it
> >>> > possible to include mpn_sumdiff_n.c in the library on such machines so
> >>> > that it is included unconditionally for all machines?
> >>> >
> >>>
> >>> we would still have the other arches to do ie power,arm etc ,to make it 
> >>> unconditional addsub needs to allocate some tmp space ,I suppose we could 
> >>> split the addsub it to various overlap cases  this may be possible , but 
> >>> for sumdiff I dont think it is
> >>>
> >>
> >> addsub is possible , and so is addadd although the case addadd_n(t,x,y,z) 
> >> where t=x=y=z requires mul_1(t,x,3) which on core2 and sandybridge the 
> >> same speed as two adds , dont know about the other arch ,although if we 
> >> consider this a rare case then it may not be important. sumdiff the only 
> >> difficult case is when the sum and difference are aliased with the bot the 
> >> sources , we could exclude this overlap condition? , it would also relax 
> >> the instruction ordering which would ease up finding faster asm versions
> >>
> >>
> >>> > Is there a reason to not have an assembly optimised version for core2?
> >>> >
> >>>
> >>> I havent found one for core2 or sandybridge which is faster than a 
> >>> separate add and sub
> >>>
> >>> > Bill.
> >>> >
> >>> >
> >>>
> >>>
> >>
> >> --
> >> You received this message because you are subscribed to the Google Groups 
> >> "mpir-devel" group.
> >> To post to this group, send email to [email protected].
> >> To unsubscribe from this group, send email to 
> >> [email protected].
> >> For more options, visit this group at 
> >> http://groups.google.com/group/mpir-devel?hl=en.
> >>
> 
> 

-- 
You received this message because you are subscribed to the Google Groups 
"mpir-devel" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/mpir-devel?hl=en.

Reply via email to