I did run it multiple times yes. I've tried a couple different devectorizations on my algorithms and none result in speed ups, and most result in slightly slower run-times. I guess I find it a bit strange because the memory allocations and garbage collection is *far less *when I devectorize, but that doesn't translate into performance improvements. Also like I said before, I'm most curious about the current status of operations of the form:
[1 2; 3 4] .* [1 2] is such an operation covered by BLAS? On Thursday, May 12, 2016 at 5:58:22 AM UTC-7, Tim Holy wrote: > > Did you run it twice? Remember that memory is allocated during JIT > compilation, so the amount of memory on the first call is completely > meaningless. > > --Tim > > On Wednesday, May 11, 2016 11:03:38 PM Anonymous wrote: > > In response to both Kristoffer and Keno's timely responses, > > > > Originally I just did a simple @time test of the form > > Matrix .* horizontal vector > > > > and then tested the same thing with for loops, and the for loops were > way > > faster (and used way less memory) > > > > However I just devectorized one of my algorithms and ran an @time > > comparison and the vectorized version was actually twice as fast as the > > devectorized version, however the vectorized version used way more > memory. > > Clearly I don't really understand the specifics of what makes code > slow, > > and in particular how vectorized code compares to devectorized code. > > Vectorized code does seem to use a lot more memory, but clearly for my > > algorithm it nevertheless runs faster than the devectorized version. Is > > there a reference I could look at that explains this to someone with a > > background in math but not much knowledge of computer architecture? > > > > On Wednesday, May 11, 2016 at 10:41:55 PM UTC-7, Keno Fischer wrote: > > > There seems to be a myth going around that vectorized code in Julia is > > > slow. That's not really the case. Often times it's just that > > > devectorized code is faster because one can manually perform > > > operations such as loop fusion, which the compiler cannot currently > > > reason about (and most C compilers can't either). In some other > > > languages those benefits get drowned out by language overhead, but in > > > julia those kinds of constructs are generally fast. The cases where > > > julia can be slower is when there is excessive memory allocation in a > > > tight inner loop, but those cases can usually be rewritten fairly > > > easily without losing the vectorized look of the code. > > > > > > On Thu, May 12, 2016 at 1:35 AM, Kristoffer Carlsson > > > > > > <[email protected] <javascript:>> wrote: > > > > It is always easier to discuss if there is a piece of code to look > at. > > > > > > Could > > > > > > > you perhaps post a few code examples that does not run as fast as > you > > > > > > want? > > > > > > > Also, make sure to look at : > > > > https://github.com/IntelLabs/ParallelAccelerator.jl. They have a > quite > > > > sophisticated compiler that does loop fusions and parallelization > and > > > > > > other > > > > > > > cool stuff. > > > > > > > > On Thursday, May 12, 2016 at 7:22:24 AM UTC+2, Anonymous wrote: > > > >> This remains one of the main drawbacks of Julia, and the > devectorize > > > >> package is basically useless as it doesn't support some really > crucial > > > >> vectorized operations. I'd really prefer not to rewrite all my > > > > > > vectorized > > > > > > >> code into nested loops if at all possible, but I really need more > > > > > > speed, can > > > > > > >> anyone tell me the timeline and future plans for making vectorized > code > > > > > > run > > > > > > >> at C speed? > >
