BTW, I'm not sure I use @parallel correctly, so I tried to start tasks
manually with @async too:
@time @sync for url in urls
@async begin
resp = get(url)
println("Status: $(resp.status)")
end
end
But I didn't notice any difference in performance.
On Sunday, August 23, 2015 at 5:52:27 AM UTC+3, Andrei Zh wrote:
>
> I'm writing a kind of a web scanner that should retrieve and analyze about
> 100k URLs as fast as possible. Of course, it will take time anyway, but I'm
> looking for how to utilize my CPUs and network as much as possible.
>
> My initial approach was to add all available processors, pack urls into
> tasks and run these tasks in parallel:
>
>
> using Requests
> urls = ...
> @time @sync @parallel for url in urls
> resp = get(url)
> println("Status: $(resp.status)")
> end
>
> My assumption was that 100k tasks would be created, each task would
> execute GET request and, since this is IO operation, free current thread
> for other tasks. From logs, however, I see that each worker executes tasks
> one by one, every time waiting for GET request to finish.
>
> So how do I start 100k requests in parallel?
>
> (100k is here just for example, I can easily split then into chunks of
> 10k, for example, so system limits and overused CPU/network are not an
> issue; issue is in their *underutilization*).
>
> Thanks
>
>