I did run it multiple times yes.  I've tried a couple different 
devectorizations on my algorithms and none result in speed ups, and most 
result in slightly slower run-times.  I guess I find it a bit strange 
because the memory allocations and garbage collection is *far less *when I 
devectorize, but that doesn't translate into performance improvements. 
 Also like I said before, I'm most curious about the current status of 
operations of the form:

[1 2; 3 4] .* [1 2]

is such an operation covered by BLAS?

On Thursday, May 12, 2016 at 5:58:22 AM UTC-7, Tim Holy wrote:
>
> Did you run it twice? Remember that memory is allocated during JIT 
> compilation, so the amount of memory on the first call is completely 
> meaningless. 
>
> --Tim 
>
> On Wednesday, May 11, 2016 11:03:38 PM Anonymous wrote: 
> > In response to both Kristoffer and Keno's timely responses, 
> > 
> > Originally I just did a simple @time test of the form 
> > Matrix .* horizontal vector 
> > 
> > and then tested the same thing with for loops, and the for loops were 
> way 
> > faster (and used way less memory) 
> > 
> > However I just devectorized one of my algorithms and ran an @time 
> > comparison and the vectorized version was actually twice as fast as the 
> > devectorized version, however the vectorized version used way more 
> memory. 
> >  Clearly I don't really understand the specifics of what makes code 
> slow, 
> > and in particular how vectorized code compares to devectorized code. 
> >  Vectorized code does seem to use a lot more memory, but clearly for my 
> > algorithm it nevertheless runs faster than the devectorized version.  Is 
> > there a reference I could look at that explains this to someone with a 
> > background in math but not much knowledge of computer architecture? 
> > 
> > On Wednesday, May 11, 2016 at 10:41:55 PM UTC-7, Keno Fischer wrote: 
> > > There seems to be a myth going around that vectorized code in Julia is 
> > > slow. That's not really the case. Often times it's just that 
> > > devectorized code is faster because one can manually perform 
> > > operations such as loop fusion, which the compiler cannot currently 
> > > reason about (and most C compilers can't either). In some other 
> > > languages those benefits get drowned out by language overhead, but in 
> > > julia those kinds of constructs are generally fast. The cases where 
> > > julia can be slower is when there is excessive memory allocation in a 
> > > tight inner loop, but those cases can usually be rewritten fairly 
> > > easily without losing the vectorized look of the code. 
> > > 
> > > On Thu, May 12, 2016 at 1:35 AM, Kristoffer Carlsson 
> > > 
> > > <[email protected] <javascript:>> wrote: 
> > > > It is always easier to discuss if there is a piece of code to look 
> at. 
> > > 
> > > Could 
> > > 
> > > > you perhaps post a few code examples that does not run as fast as 
> you 
> > > 
> > > want? 
> > > 
> > > > Also, make sure to look at : 
> > > > https://github.com/IntelLabs/ParallelAccelerator.jl. They have a 
> quite 
> > > > sophisticated compiler that does loop fusions and parallelization 
> and 
> > > 
> > > other 
> > > 
> > > > cool stuff. 
> > > > 
> > > > On Thursday, May 12, 2016 at 7:22:24 AM UTC+2, Anonymous wrote: 
> > > >> This remains one of the main drawbacks of Julia, and the 
> devectorize 
> > > >> package is basically useless as it doesn't support some really 
> crucial 
> > > >> vectorized operations.  I'd really prefer not to rewrite all my 
> > > 
> > > vectorized 
> > > 
> > > >> code into nested loops if at all possible, but I really need more 
> > > 
> > > speed, can 
> > > 
> > > >> anyone tell me the timeline and future plans for making vectorized 
> code 
> > > 
> > > run 
> > > 
> > > >> at C speed? 
>
>

Reply via email to