I submitted a new issue <https://github.com/JuliaWeb/Requests.jl/issues/61> to Requests.jl describing the problem and my observations. In short, each separate request takes a reasonable time, but when I launch a lot of them in tasks, they become very slow.
On Tuesday, August 25, 2015 at 2:44:30 AM UTC+3, Andrei Zh wrote: > > @Jameson: setting UV_THREADPOOL_SIZE to 32 seems to reduce DNS resolution > time twice (from 26 to 12 seconds on my latest tests), so thank you. > > However, DNS seems to be not the only root of the problem: I noticed that > with large number of URLs first ones get result very quickly (300-800ms), > but then latency begins to grow until timeout is exceeded (I'm seen more > then 100 seconds for some requests). > > I will try to set up stable test and post it here as well as to issues in > Request.jl. > > > On Monday, August 24, 2015 at 8:21:51 PM UTC+3, Jameson wrote: >> >> If you are doing a lot of parallel dns queries, you may want to try >> increasing the number that can be run simultaneously but setting the >> UV_THREADPOOL_SIZE environment variable before starting julia to something >> larger (default is 4, max is 128). >> >> On Mon, Aug 24, 2015 at 9:17 AM Andrei Zh <[email protected]> wrote: >> >>> Jonathan, thanks for your support. So far I noticed that DNS gives >>> pretty large delay. E.g. resolving IP addresses for 1000 URLs took 80 >>> seconds in serial code and 26 seconds in muli-task code: >>> >>> >>> Serial execution: >>> >>> julia> @time for url in urls >>> begin >>> Base.getaddrinfo(URI(url).host) >>> end >>> end >>> elapsed time: 80.071810293 seconds (732400 bytes allocated) >>> >>> >>> Multitask execution: >>> >>> >>> julia> @time @sync for url in urls >>> >>> @async begin >>> Base.getaddrinfo(URI(url).host) >>> end >>> end >>> >>> elapsed time: 26.241893516 seconds (4277968 bytes allocated) >>> >>> So I'll try to pre-resolve IPs and test again. >>> >>> >>> On Monday, August 24, 2015 at 4:01:44 PM UTC+3, Jonathan Malmaud wrote: >>> >>>> As one of the maintainers of Requests.jl, I'm especially interested in >>>> its use for high-performance applications so don't hesitate to file an >>>> issue if it gives you any performance problems. >>>> >>>> On Sunday, August 23, 2015 at 7:40:08 PM UTC-4, Andrei Zh wrote: >>>>> >>>>> Hi Steven, >>>>> >>>>> thanks for your answer! It turns out I misunderstood @async long time >>>>> ago, assuming it also makes a remote call to other processes and thus >>>>> introduces true multi-tasking. So now I need to rethink my approach >>>>> before >>>>> going further. >>>>> >>>>> Just to clarify: my goal is to perform as many requests as possible at >>>>> the same time, so I want to use both - multiple processes (to start >>>>> several >>>>> requests at several cores in parallel) and tasks (to launch new requests >>>>> while old ones are still waiting for IO to complete). >>>>> >>>>> So I will update my approach and come back with results or new >>>>> questions. >>>>> >>>>> >>>>> >>>>> On Monday, August 24, 2015 at 2:13:23 AM UTC+3, Steven G. Johnson >>>>> wrote: >>>>>> >>>>>> @parallel in Julia is for executing separate parallel processes (true >>>>>> multi-tasking, with separate address spaces). @async is for "tasks", >>>>>> which >>>>>> are "green threads" and represent cooperative multitasking (within the >>>>>> same >>>>>> process and the same address space). >>>>>> >>>>>> I/O in Julia is asynchronous — while one task is blocked waiting on >>>>>> I/O, another task will wake up and start running. (This is based on the >>>>>> libuv library, which is designed for high-performance asynchronous I/O.) >>>>>> >>>>>> The first question is whether you want to fetch URLs in separate OS >>>>>> processes, or you want to use green threads within the same process. It >>>>>> sounds like you want the latter, in which case @async is the right thing. >>>>>> >>>>>> The second question is whether something about the Requests.jl >>>>>> package is serializing things somehow; for that you might file an issue >>>>>> at >>>>>> Requests.jl. >>>>>> >>>>>
