Jonathan, thanks for your support. So far I noticed that DNS gives pretty 
large delay. E.g. resolving IP addresses for 1000 URLs took 80 seconds in 
serial code and 26 seconds in muli-task code: 


Serial execution: 

julia> @time for url in urls
               begin
                   Base.getaddrinfo(URI(url).host)
               end
           end
elapsed time: 80.071810293 seconds (732400 bytes allocated)


Multitask execution:


julia> @time @sync for url in urls
               # sleep(0.01)                                               
                                                                            
                                                      
               @async begin
                   # t = @elapsed resp = get(url)                           
                                                                            
                                                     
                   # println("Status: $(resp.status) ($(t) sec)")           
                                                                            
                                                     
                   Base.getaddrinfo(URI(url).host)
               end
           end




elapsed time: 26.241893516 seconds (4277968 bytes allocated)

So I'll try to pre-resolve IPs and test again. 





On Monday, August 24, 2015 at 4:01:44 PM UTC+3, Jonathan Malmaud wrote:
>
> As one of the maintainers of Requests.jl, I'm especially interested in its 
> use for high-performance applications so don't hesitate to file an issue if 
> it gives you any performance problems.
>
> On Sunday, August 23, 2015 at 7:40:08 PM UTC-4, Andrei Zh wrote:
>>
>> Hi Steven, 
>>
>> thanks for your answer! It turns out I misunderstood @async long time 
>> ago, assuming it also makes a remote call to other processes and thus 
>> introduces true multi-tasking. So now I need to rethink my approach before 
>> going further. 
>>
>> Just to clarify: my goal is to perform as many requests as possible at 
>> the same time, so I want to use both - multiple processes (to start several 
>> requests at several cores in parallel) and tasks (to launch new requests 
>> while old ones are still waiting for IO to complete). 
>>
>> So I will update my approach and come back with results or new questions. 
>>
>>
>>
>> On Monday, August 24, 2015 at 2:13:23 AM UTC+3, Steven G. Johnson wrote:
>>>
>>> @parallel in Julia is for executing separate parallel processes (true 
>>> multi-tasking, with separate address spaces).  @async is for "tasks", which 
>>> are "green threads" and represent cooperative multitasking (within the same 
>>> process and the same address space).
>>>
>>> I/O in Julia is asynchronous — while one task is blocked waiting on I/O, 
>>> another task will wake up and start running.  (This is based on the libuv 
>>> library, which is designed for high-performance asynchronous I/O.)
>>>
>>> The first question is whether you want to fetch URLs in separate OS 
>>> processes, or you want to use green threads within the same process.  It 
>>> sounds like you want the latter, in which case @async is the right thing.
>>>
>>> The second question is whether something about the Requests.jl package 
>>> is serializing things somehow; for that you might file an issue at 
>>> Requests.jl.
>>>
>>

Reply via email to