I've done it both with/out `parallel:` as shown below, but get the same
compiler output.
parallel:
var cnt = 0 # count for the segment primes '1'
bytes
for i in 0..<rescnt: # count Kn resgroups|bytes each
restrack
cnt += spawn segcount(i*KB, Kn)
sync()
primecnt += cnt.uint # update primecnt for the segment
I'm also using `spawn` earlier in `segsieve` which works with no problems, and
actually does operate in parallel, which I can verify by looking at the
program's operation using `htop`.
Below is the total `segsieve` code.
# This routine performs the prime sieve for a restrack of Kn
resgroups|bytes.
# 'nextp' resgroup vals for restrack 'r' mark prime multiples on it in 'seg'
# and are udpated for each prime for the next segment.
proc residue_sieve(row: int, seg_rti: int, Kn: int)=
for j, prime in primes: # for each prime r1..sqrt(N)
if nextp[row+j] < Kn.uint: # if 1st mult resgroup is within 'seg'
var k = nextp[row+j].int # starting from this resgroup in 'seg'
while k < Kn: # for each primenth byte to end of
'seg'
seg[seg_rti + k] = 0 # mark byte in segment as nonprime
k += prime # compute next prime multiple resgroup
nextp[row+j] = uint(k - Kn) # save 1st resgroup in next eligible
seg
else: nextp[row+j] -= Kn.uint # do if 1st mult resgroup not within
seg
# Count the primes on each row of Kn resgroups|bytes in 'seg' memory.
proc segcount(row, Kn: int): int = # for this row in 'seg' of Kn
bytes
var cnt = 0
for k in 0..<Kn: cnt += seg[row + k].int # add primes '1' (and nonprimes
'0')
result = cnt # return count of primes for
'row'
# This routine performs the total prime sieve for Kn resgroups|bytes by
# processing each residue track individually (in parallel). Then the
# segment primes count is computed and added to global var 'primecnt'.
proc segsieve(Kn: int) = # for Kn resgroups in segment
for b in 0..<seg.len: seg[b] = 1 # initialize seg bytes to all prime '1'
parallel:
for r in 0..<rescnt: # for each residue track number 'r'
let row = r * pcnt # set the 'nextp' table row address
let seg_rti = r * KB # set the segment mem row address
spawn residue_sieve(row, seg_rti, Kn) # mark the prime multiples
along it
sync()
#parallel:
var cnt = 0 # count for the nonprimes, the '1'
bytes
for i in 0..<rescnt: # count Kn resgroups along each
restrack
#cnt += segcount(i*KB, Kn)
cnt += spawn segcount(i*KB, Kn)
sync()
primecnt += cnt.uint
I'm trying to get `segcount` to operate in parallel too, which should make the
program even faster. When I get this working I'll write this all up and update
my a paper to show the new parallel algorithm architecture, and the Nim
implementation.