Re: [Cython] Implementation of complex division for the struct case

Carl Witty Thu, 14 May 2009 16:32:47 -0700

On Thu, May 14, 2009 at 4:07 PM, Robert Bradshaw
<[email protected]> wrote:
> On May 14, 2009, at 3:56 PM, Lisandro Dalcin wrote:
>
>> I used to teach numerical calculus, so I cannot resist :-)
>
> Thanks for the heads up--I deal mostly with exact arithmetic (e.g.
> when writing that code I had the gaussian integers in my head) so
> it's good to have an expert speak up.
>
>> The current implementation of complex division for the struct case is
>> the dangerous one, it uses:
>>
>> 2 divides
>> 6 multiplies
>> 3 additions
>>
>> A safer way, far less sensible to roundoff/overflow (looks at core
>> CPython's complexobject.c, and skip the checks for dividing by zero),
>> would likely require:
>>
>> 2 comparison
>> 3 divides
>> 3 multiplies
>> 3 additions
>>
>> So the key question is (having a serious headache right now, hope I
>> got the number above and below right):
>>
>> Are 2 comparison and 1 divide SLOWER than 3 multiplies ??
>>
>> If the answer is NO, we should clearly use the SAFER implementation.
>> If the answer is YES, should we trade SPEED over roundoff/overflow
>> issues?
>
> No idea--please post some benchmarks. It looks like more than 2
> comparisons, as one has to take absolute values too. Branching can be
> expensive, but it's clearly not obvious enough to tell just by
> talking about it. If they're close, safer is better for sure,
> otherwise we'll have to really have to weight the pros and cons.


Absolute value need not require a comparison.  On x86 when using the
old 8087-based floating point model, there is an fabs instruction; in
other cases, you can do absolute value with a bitwise AND (and gcc
does so, at least when using SSE2 for the x86).

The cost of comparisons and branches will depend a lot on context.
(Does the branch go the same way almost every time a division is
performed?  Are divisions performed often enough to keep the branches
in the branch prediction cache?)

Divisions are very expensive, but the relative cost of division and
multiplication will obviously depend on the architecture, etc.

In short: Yes, you should do benchmarks :)

But benchmarks should be careful to have a realistic mix of numbers to
divide, to properly test the branch prediction.  For example, compute
a couple of arrays of random complexes, and then benchmark elementwise
division of these arrays (assuming that we think that random complex
numbers are "realistic").

Carl
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Implementation of complex division for the struct case

Reply via email to