On Sat, Apr 2, 2016 at 8:33 AM, Erik Schnetter <[email protected]> wrote: > Can you put a number on the task creating and task switching overhead?
The most obvious expensive part is the `jl_setjmp` and `jl_longjmp`. Not really sure how long those takes. There are also other overhead like finding another task to run. For this particular example, the task version is doing 1. Box a value 2. Switch task 3. Return the boxed value 4. Since it is not type stable (I don't think type inference can handle tasks), a dynamic dispatch and another boxing for the result 5. Switch task again. Every single one above are way more expensive than the integer addition. @Carl, If you want to do this in this style, I believe the right abstraction is generator (i.e. iterator). The support is added on master and there's work on making it faster. > > For example, if everything runs on a single core, task switching could > (theoretically) happen within 100 ns on a modern CPU. Whether that is > the case depends on the tradeoffs and choices made during design and > implementation, hence the question. > > -erik > > On Sat, Apr 2, 2016 at 8:28 AM, Yichao Yu <[email protected]> wrote: >> On Fri, Apr 1, 2016 at 3:45 PM, Carl <[email protected]> wrote: >>> Hello, >>> >>> Julia is great. I'm new fan and trying to figure out how to write simple >>> coroutines that are fast. So far I have this: >>> >>> >>> function task_iter(vec) >>> @task begin >>> i = 1 >>> for v in vec >>> produce(v) >>> end >>> end >>> end >>> >>> function task_sum(vec) >>> s = 0.0 >>> for val in task_iter(vec) >>> s += val >>> end >>> return s >>> end >>> >>> function normal_sum(vec) >>> s = 0.0 >>> for val in vec >>> s += val >>> end >>> return s >>> end >>> >>> values = rand(10^6) >>> task_sum(values) >>> normal_sum(values) >>> >>> @time task_sum(values) >>> @time normal_sum(values) >>> >>> >>> >>> 1.067081 seconds (2.00 M allocations: 30.535 MB, 1.95% gc time) >>> 0.006656 seconds (5 allocations: 176 bytes) >>> >>> I was hoping to be able to get the speeds to match (as close as possible). >>> I've read the performance tips and I can't find anything I'm doing wrong. I >>> also tried out 0.5 thinking that maybe it would be faster with supporting >>> fast anonymous functions but it was slower (1.5 seconds). >> >> Tasks are expensive and are basically designed for IO. >> ~1000x slow down for this simple stuff is expected. >> >>> >>> >>> Carl >>> > > > > -- > Erik Schnetter <[email protected]> > http://www.perimeterinstitute.ca/personal/eschnetter/
