@Kristoffer Carlsson , I do appreciate your help and your clever use of 
Yeppp, which is limited to reals.  I may be able to redesign my algorithm 
with all reals and get faster execution with Julia than with Python, which 
does not have a wrapper for Yeppp that I could find.  Doing so may also 
involve vectorized dot products with the BLAS library.  Since I am 
processing GB-sized vectors, this involves large temporary vectors, and I 
may not have enough RAM.  Also, it contradicts the advice I was given 
previously in this forum to write my loops in pure Julia for speed.  So I 
would still like to hear from forum members about how to get parallelized 
pure Julia code executing faster than single-threaded.  Maybe I have to 
wait for Julia v0.5 for this to manifest.

Reply via email to