I don't know if this is currently possible, but it would be nice to have.
It seems I could theoretically speed the execution of this code.
proc segsieve(Kmax: uint, KB: int) = # for Kn resgroups|bytes in segment
let Ks = KB # make default seg size immutable
parallel: # perform SSoZ in parallel
for r in 0..rescnt-1: # for each residue track number 'r'
let nextp_row = r * pcnt # set the 'nextp' table row address
let seg_row = r * Ks # set the 'seg' memory row address
spawn residue_sieve(nextp_row, seg_row, Kmax, Ks, r) # do sieve for
row 'r'
sync() # wait for all row threads to finish
for i in 0..rescnt-1: # update 'primecnt' with the count of
primecnt += cnts[i] # segment primes for each 'seg' row
Here `sync()` causes the following code to wait for execution until all the
threads finished executing. It should be theoretically possible to speed
overall execution by having the `cnts` from each thread be asynchronously put
into a thread queue (FIFO) and extracted and added to `primecnt`. Since here
there are a known number of `cnt` values (`rescnt` amount) `primecnt` can then
be updated as these values become availble until `rescnt` are added. Is this
possible now? Could it be faster?