Instead of using tic toc use @time to time your loops. You will find that in your sequential loop you are allocating a lot of memory, while the @parallel loop does not. The difference in time is due to the memory allocation. One of my students ran into this earlier this week and that was the cause in his case. My understanding is that the compiler does not optimize for loops done at the top level. When you put the sequential loop in a function the excessive memory goes away, which makes the sequential loop faster.
You need to be careful using @parallel with no worker process. With no workers the @parallel loop can modify globals and you will get the correct result because it is all done in the same process. When you add workers the globals will be copied to each worker and the changes will be done on the workers copy and the result is not copied back to the master process. So code that works with no workers will break when using drugs workers.
