@Jameson: setting UV_THREADPOOL_SIZE to 32 seems to reduce DNS resolution 
time twice (from 26 to 12 seconds on my latest tests), so thank you. 

However, DNS seems to be not the only root of the problem: I noticed that 
with large number of URLs first ones get result very quickly (300-800ms), 
but then latency begins to grow until timeout is exceeded (I'm seen more 
then 100 seconds for some requests). 

I will try to set up stable test and post it here as well as to issues in 
Request.jl. 


On Monday, August 24, 2015 at 8:21:51 PM UTC+3, Jameson wrote:
>
> If you are doing a lot of parallel dns queries, you may want to try 
> increasing the number that can be run simultaneously but setting the 
> UV_THREADPOOL_SIZE environment variable before starting julia to something 
> larger (default is 4, max is 128).
>
> On Mon, Aug 24, 2015 at 9:17 AM Andrei Zh <[email protected] 
> <javascript:>> wrote:
>
>> Jonathan, thanks for your support. So far I noticed that DNS gives pretty 
>> large delay. E.g. resolving IP addresses for 1000 URLs took 80 seconds in 
>> serial code and 26 seconds in muli-task code: 
>>
>>
>> Serial execution: 
>>
>> julia> @time for url in urls
>>                begin
>>                    Base.getaddrinfo(URI(url).host)
>>                end
>>            end
>> elapsed time: 80.071810293 seconds (732400 bytes allocated)
>>
>>
>> Multitask execution:
>>
>>
>> julia> @time @sync for url in urls
>>
>>            @async begin                                           
>>                Base.getaddrinfo(URI(url).host)
>>            end
>>        end
>>
>> elapsed time: 26.241893516 seconds (4277968 bytes allocated)
>>
>> So I'll try to pre-resolve IPs and test again. 
>>
>>
>> On Monday, August 24, 2015 at 4:01:44 PM UTC+3, Jonathan Malmaud wrote:
>>
>>> As one of the maintainers of Requests.jl, I'm especially interested in 
>>> its use for high-performance applications so don't hesitate to file an 
>>> issue if it gives you any performance problems.
>>>
>>> On Sunday, August 23, 2015 at 7:40:08 PM UTC-4, Andrei Zh wrote:
>>>>
>>>> Hi Steven, 
>>>>
>>>> thanks for your answer! It turns out I misunderstood @async long time 
>>>> ago, assuming it also makes a remote call to other processes and thus 
>>>> introduces true multi-tasking. So now I need to rethink my approach before 
>>>> going further. 
>>>>
>>>> Just to clarify: my goal is to perform as many requests as possible at 
>>>> the same time, so I want to use both - multiple processes (to start 
>>>> several 
>>>> requests at several cores in parallel) and tasks (to launch new requests 
>>>> while old ones are still waiting for IO to complete). 
>>>>
>>>> So I will update my approach and come back with results or new 
>>>> questions. 
>>>>
>>>>
>>>>
>>>> On Monday, August 24, 2015 at 2:13:23 AM UTC+3, Steven G. Johnson wrote:
>>>>>
>>>>> @parallel in Julia is for executing separate parallel processes (true 
>>>>> multi-tasking, with separate address spaces).  @async is for "tasks", 
>>>>> which 
>>>>> are "green threads" and represent cooperative multitasking (within the 
>>>>> same 
>>>>> process and the same address space).
>>>>>
>>>>> I/O in Julia is asynchronous — while one task is blocked waiting on 
>>>>> I/O, another task will wake up and start running.  (This is based on the 
>>>>> libuv library, which is designed for high-performance asynchronous I/O.)
>>>>>
>>>>> The first question is whether you want to fetch URLs in separate OS 
>>>>> processes, or you want to use green threads within the same process.  It 
>>>>> sounds like you want the latter, in which case @async is the right thing.
>>>>>
>>>>> The second question is whether something about the Requests.jl package 
>>>>> is serializing things somehow; for that you might file an issue at 
>>>>> Requests.jl.
>>>>>
>>>>

Reply via email to