On Sat, Apr 2, 2016 at 8:33 AM, Erik Schnetter <[email protected]> wrote:
> Can you put a number on the task creating and task switching overhead?

The most obvious expensive part is the `jl_setjmp` and `jl_longjmp`.
Not really sure how long those takes.
There are also other overhead like finding another task to run.

For this particular example, the task version is doing

1. Box a value
2. Switch task
3. Return the boxed value
4. Since it is not type stable (I don't think type inference can
handle tasks), a dynamic dispatch and another boxing for the result
5. Switch task again.

Every single one above are way more expensive than the integer addition.

@Carl,

If you want to do this in this style, I believe the right abstraction
is generator (i.e. iterator). The support is added on master and
there's work on making it faster.

>
> For example, if everything runs on a single core, task switching could
> (theoretically) happen within 100 ns on a modern CPU. Whether that is
> the case depends on the tradeoffs and choices made during design and
> implementation, hence the question.
>
> -erik
>
> On Sat, Apr 2, 2016 at 8:28 AM, Yichao Yu <[email protected]> wrote:
>> On Fri, Apr 1, 2016 at 3:45 PM, Carl <[email protected]> wrote:
>>> Hello,
>>>
>>> Julia is great.  I'm new fan and trying to figure out how to write simple
>>> coroutines that are fast.  So far I have this:
>>>
>>>
>>> function task_iter(vec)
>>> @task begin
>>> i = 1
>>> for v in vec
>>> produce(v)
>>> end
>>> end
>>> end
>>>
>>> function task_sum(vec)
>>> s = 0.0
>>> for val in task_iter(vec)
>>>                 s += val
>>> end
>>> return s
>>> end
>>>
>>> function normal_sum(vec)
>>> s = 0.0
>>> for val in vec
>>> s += val
>>> end
>>> return s
>>> end
>>>
>>> values = rand(10^6)
>>> task_sum(values)
>>> normal_sum(values)
>>>
>>> @time task_sum(values)
>>> @time normal_sum(values)
>>>
>>>
>>>
>>>   1.067081 seconds (2.00 M allocations: 30.535 MB, 1.95% gc time)
>>>   0.006656 seconds (5 allocations: 176 bytes)
>>>
>>> I was hoping to be able to get the speeds to match (as close as possible).
>>> I've read the performance tips and I can't find anything I'm doing wrong.  I
>>> also tried out 0.5 thinking that maybe it would be faster with supporting
>>> fast anonymous functions but it was slower (1.5 seconds).
>>
>> Tasks are expensive and are basically designed for IO.
>> ~1000x slow down for this simple stuff is expected.
>>
>>>
>>>
>>> Carl
>>>
>
>
>
> --
> Erik Schnetter <[email protected]>
> http://www.perimeterinstitute.ca/personal/eschnetter/

Reply via email to