On Wed, Oct 15, 2014 at 11:08 AM, Steven G. Johnson
<[email protected]> wrote:
>
>
> On Wednesday, October 15, 2014 8:59:38 AM UTC-4, Erik Schnetter wrote:
>>
>> Modern x86 CPUs handle floats at about twice the speed as doubles. A
>> floating-point instruction usually takes one cycle, and each
>> instruction can execute multiple operations due to vectorization. With
>> doubles, you can have 4 operations per instruction, and with floats,
>> you can have 8 operations per instruction.
>
>
> That assumes that everything obtains optimal SIMD vectorization, which is
> usually false.

The original question stated "most time is spent in BLAS", in
particular in axpy. We can safely assume that axpy is vectorized.

-erik

-- 
Erik Schnetter <[email protected]>
http://www.perimeterinstitute.ca/personal/eschnetter/

Reply via email to