or for ldc http://docs.algorithm.dlang.io/latest/mir_math_common.html

can you try it with c math functions?

instead of std.math, try to use core.stdc.math

Much better with mir.math.common, still a bit slower than C (even with larger loops):

10^7 iterations using sigmoid1: 168 ms
10^7 iterations using sigmoid2: 39 ms

Also LDC optimized away the computation. So I had to modify the code a bit.

Have you tried LLVM intrinsics? say llvm_exp

