On Thursday 28 February 2002 22:03, Guillermo Ballester Valor wrote:
> Hi,
>
> On Thu 28 Feb 2002 22:19, Brian J. Beesley wrote:
[... snip ...]
> > The difference here is that your method generates memory bus traffic at
> > twice the rate. George's method takes advantage of the fact that (with
> > properly aligned operands) fetching the "odd" element data automatically
> > fetches the adjacent "even" element data.
>
> The streams would be alternated :  stream0_data(n) , stream1_data(n),
> stream0_data(n+1), stream1_data(n+1)...
>
> When fetching data(n) for a stream we also get the other.

Yes, this scheme does seem to work.
>
> The memory bottleneck was the first thing I thought, and I was near to
> discard the idea when I realized that the trig bata would be the same, and
> the required memory access would be less than double the single stream
> scheme. If a double stream version cost less than double the single one the
> we can speed up the project a bit.

On Friday 01 March 2002 00:37, George Woltman wrote:
>
> Well, that would be true if SSE2 had a multiply vector by scalar
> instruction. That is, to multiply two values by the same trig value, you
> must either load two copies the trig value or add instructions to copy the
> value into both halves
> of the SSE2 register.

I can't see that being a major problem. Surely there's only one main memory 
fetch to load the two halves of the SSE2 register with the same value, and 
surely the loads can be done in parallel since there's no interaction.
( M -> X; then X -> R1 & X -> R2 in parallel, where X is one of the temporary 
registers available to the pipeline)

On Thursday 28 February 2002 21:20, Steinar H. Gunderson wrote:
>
> Testing a number in parallel with itself is obviously a bad idea if there
> occurs an undetected error. :-)

Sure. But the only way there would be a problem here (given that the data 
values are independent because of the different random offsets) is if there 
was a major error like miscounting the number of iterations. This is 
relatively easy to test out.

I'm sort of marginally uneasy, rather than terrified, about running a 
double-check in parallel with the first test on the same system at the same 
time. Also, I think most people would rather complete one assignment in time 
T rather than two assignments in time 2T with both results unknown till they 
both complete. Against this is that Guillermo's suggestion does something to 
counter the relatively low rate at which DCs are completed.

Regards
Brian Beesley

_________________________________________________________________________
Unsubscribe & list info -- http://www.ndatech.com/mersenne/signup.htm
Mersenne Prime FAQ      -- http://www.tasam.com/~lrwiman/FAQ-mers

Reply via email to