For me it takes 0.27 seconds (which can be reduced to 0.09sec by
optimization, see below), so you are doing something wrong if you are
getting 10x slower numbers.
My guess is that you are including the Julia startup time in the timings.
Instead, do:
@time println(primes(10000000))
@time println(primes(10000000))
in your code, to time just the execution of primes(10000000) and print the
results, running the benchmark twice because the first time you run
something in Julia it is a bit slower due to compilation time. (The same
is true in PyPy.)
PS. It is a bit better to use
sieveTo = isqrt(N)
so that sieveTo is an integer, as "ceil" returns a floating-point value
(which makes i and j floating-point as well). Or you could use
int(ceil(sqrt(N)), but isqrt is a bit more careful about roundoff errors.
Also, your S array is a floating-point array, whereas you only need an
array of bits. Use
S = ones(Bool, N)
instead. This brings the time down to 0.09 seconds for me, comparable to
your C numbers.