If you are doing a lot of parallel dns queries, you may want to try
increasing the number that can be run simultaneously but setting the
UV_THREADPOOL_SIZE environment variable before starting julia to something
larger (default is 4, max is 128).

On Mon, Aug 24, 2015 at 9:17 AM Andrei Zh <[email protected]> wrote:

> Jonathan, thanks for your support. So far I noticed that DNS gives pretty
> large delay. E.g. resolving IP addresses for 1000 URLs took 80 seconds in
> serial code and 26 seconds in muli-task code:
>
>
> Serial execution:
>
> julia> @time for url in urls
>                begin
>                    Base.getaddrinfo(URI(url).host)
>                end
>            end
> elapsed time: 80.071810293 seconds (732400 bytes allocated)
>
>
> Multitask execution:
>
>
> julia> @time @sync for url in urls
>
>            @async begin
>                Base.getaddrinfo(URI(url).host)
>            end
>        end
>
> elapsed time: 26.241893516 seconds (4277968 bytes allocated)
>
> So I'll try to pre-resolve IPs and test again.
>
>
> On Monday, August 24, 2015 at 4:01:44 PM UTC+3, Jonathan Malmaud wrote:
>
>> As one of the maintainers of Requests.jl, I'm especially interested in
>> its use for high-performance applications so don't hesitate to file an
>> issue if it gives you any performance problems.
>>
>> On Sunday, August 23, 2015 at 7:40:08 PM UTC-4, Andrei Zh wrote:
>>>
>>> Hi Steven,
>>>
>>> thanks for your answer! It turns out I misunderstood @async long time
>>> ago, assuming it also makes a remote call to other processes and thus
>>> introduces true multi-tasking. So now I need to rethink my approach before
>>> going further.
>>>
>>> Just to clarify: my goal is to perform as many requests as possible at
>>> the same time, so I want to use both - multiple processes (to start several
>>> requests at several cores in parallel) and tasks (to launch new requests
>>> while old ones are still waiting for IO to complete).
>>>
>>> So I will update my approach and come back with results or new
>>> questions.
>>>
>>>
>>>
>>> On Monday, August 24, 2015 at 2:13:23 AM UTC+3, Steven G. Johnson wrote:
>>>>
>>>> @parallel in Julia is for executing separate parallel processes (true
>>>> multi-tasking, with separate address spaces).  @async is for "tasks", which
>>>> are "green threads" and represent cooperative multitasking (within the same
>>>> process and the same address space).
>>>>
>>>> I/O in Julia is asynchronous — while one task is blocked waiting on
>>>> I/O, another task will wake up and start running.  (This is based on the
>>>> libuv library, which is designed for high-performance asynchronous I/O.)
>>>>
>>>> The first question is whether you want to fetch URLs in separate OS
>>>> processes, or you want to use green threads within the same process.  It
>>>> sounds like you want the latter, in which case @async is the right thing.
>>>>
>>>> The second question is whether something about the Requests.jl package
>>>> is serializing things somehow; for that you might file an issue at
>>>> Requests.jl.
>>>>
>>>

Reply via email to