Ok. My original code certainly spends most of the time on looping the interpolation. In that sense, the example I post here is similar to the original code and hightlights the problem I am facing I think. Fwiw, I wrap the original code in a function. I also do it above, at least for the performance relevant part I guess. If I do it for the whole code above -- no difference.
The first time I translated the code from MATLAB to Julia, I actually computed the expectation point by point with 2 double loops (or also with 1 loop and then reshaping) -- it was slower than vectorizing. (see also the stackoverflow post) . @Lutfullah Tomak: I don't really get that point, maybe I am misunderstanding something but the innermost part of the loop is wrapped in a function, so there should be no global variable?!