BTW, I'm not sure I use @parallel correctly, so I tried to start tasks 
manually with @async too: 

@time @sync for url in urls
    @async begin
        resp = get(url)
        println("Status: $(resp.status)")
    end
end

But I didn't notice any difference in performance. 


On Sunday, August 23, 2015 at 5:52:27 AM UTC+3, Andrei Zh wrote:
>
> I'm writing a kind of a web scanner that should retrieve and analyze about 
> 100k URLs as fast as possible. Of course, it will take time anyway, but I'm 
> looking for how to utilize my CPUs and network as much as possible. 
>
> My initial approach was to add all available processors, pack urls into 
> tasks and run these tasks in parallel: 
>
>     
> using Requests
> urls = ...
> @time @sync @parallel for url in urls
>     resp = get(url)
>     println("Status: $(resp.status)")
> end
>
> My assumption was that 100k tasks would be created, each task would 
> execute GET request and, since this is IO operation, free current thread 
> for other tasks. From logs, however, I see that each worker executes tasks 
> one by one, every time waiting for GET request to finish. 
>
> So how do I start 100k requests in parallel? 
>
> (100k is here just for example, I can easily split then into chunks of 
> 10k, for example, so system limits and overused CPU/network are not an 
> issue; issue is in their *underutilization*). 
>
> Thanks
>
>

Reply via email to