I should add that I don't think the error lies in my `reap' function for remote calls. As I can correctly call the cudakernels.sqrownorms function on the host from the remote:
julia> reap(3, :(reap(1, :(sum(cudakernels.sqrownorms(d_M))))[1] ))[3] 5.149127f6 (The above gets process three to call the kernel on process 1 and then returns the result from 3 to 1.)
