Jonathan, thanks for your support. So far I noticed that DNS gives pretty
large delay. E.g. resolving IP addresses for 1000 URLs took 80 seconds in
serial code and 26 seconds in muli-task code:
Serial execution:
julia> @time for url in urls
begin
Base.getaddrinfo(URI(url).host)
end
end
elapsed time: 80.071810293 seconds (732400 bytes allocated)
Multitask execution:
julia> @time @sync for url in urls
@async begin
Base.getaddrinfo(URI(url).host)
end
end
elapsed time: 26.241893516 seconds (4277968 bytes allocated)
So I'll try to pre-resolve IPs and test again.
On Monday, August 24, 2015 at 4:01:44 PM UTC+3, Jonathan Malmaud wrote:
>
> As one of the maintainers of Requests.jl, I'm especially interested in its
> use for high-performance applications so don't hesitate to file an issue if
> it gives you any performance problems.
>
> On Sunday, August 23, 2015 at 7:40:08 PM UTC-4, Andrei Zh wrote:
>>
>> Hi Steven,
>>
>> thanks for your answer! It turns out I misunderstood @async long time
>> ago, assuming it also makes a remote call to other processes and thus
>> introduces true multi-tasking. So now I need to rethink my approach before
>> going further.
>>
>> Just to clarify: my goal is to perform as many requests as possible at
>> the same time, so I want to use both - multiple processes (to start several
>> requests at several cores in parallel) and tasks (to launch new requests
>> while old ones are still waiting for IO to complete).
>>
>> So I will update my approach and come back with results or new questions.
>>
>>
>>
>> On Monday, August 24, 2015 at 2:13:23 AM UTC+3, Steven G. Johnson wrote:
>>>
>>> @parallel in Julia is for executing separate parallel processes (true
>>> multi-tasking, with separate address spaces). @async is for "tasks", which
>>> are "green threads" and represent cooperative multitasking (within the same
>>> process and the same address space).
>>>
>>> I/O in Julia is asynchronous — while one task is blocked waiting on I/O,
>>> another task will wake up and start running. (This is based on the libuv
>>> library, which is designed for high-performance asynchronous I/O.)
>>>
>>> The first question is whether you want to fetch URLs in separate OS
>>> processes, or you want to use green threads within the same process. It
>>> sounds like you want the latter, in which case @async is the right thing.
>>>
>>> The second question is whether something about the Requests.jl package
>>> is serializing things somehow; for that you might file an issue at
>>> Requests.jl.
>>>
>>