Le mercredi 11 mai 2016 à 23:03 -0700, Anonymous a écrit :
> In response to both Kristoffer and Keno's timely responses,
> 
> Originally I just did a simple @time test of the form
> Matrix .* horizontal vector
> 
> and then tested the same thing with for loops, and the for loops were
> way faster (and used way less memory)
> 
> However I just devectorized one of my algorithms and ran an @time
> comparison and the vectorized version was actually twice as fast as
> the devectorized version, however the vectorized version used way
> more memory.  Clearly I don't really understand the specifics of what
> makes code slow, and in particular how vectorized code compares to
> devectorized code.  Vectorized code does seem to use a lot more
> memory, but clearly for my algorithm it nevertheless runs faster than
> the devectorized version.  Is there a reference I could look at that
> explains this to someone with a background in math but not much
> knowledge of computer architecture?
I don't know about a reference, but I suspect this is due to BLAS.
Vectorized versions of linear algebra operations like matrix
multiplication are highly optimized, and run several threads in
parallel. OTC, your devectorized code isn't carefully tuned for a
specific processor model, and uses a single CPU core (soon Julia will
support using several threads, and see [1]).

So depending on the particular operations you're running, the
vectorized form can be faster even though it allocates more memory. In
general, it will likely be faster to use BLAS for expensive operations
on large matrices. OTOH, it's better to devectorize code if you
successively perform several simple operations on an array, because
each operation currently allocates a copy of the array (this may well
change with [2]).


Regards


1: http://julialang.org/blog/2016/03/parallelaccelerator
2: https://github.com/JuliaLang/julia/issues/16285

> > There seems to be a myth going around that vectorized code in Julia
> > is 
> > slow. That's not really the case. Often times it's just that 
> > devectorized code is faster because one can manually perform 
> > operations such as loop fusion, which the compiler cannot
> > currently 
> > reason about (and most C compilers can't either). In some other 
> > languages those benefits get drowned out by language overhead, but
> > in 
> > julia those kinds of constructs are generally fast. The cases
> > where 
> > julia can be slower is when there is excessive memory allocation in
> > a 
> > tight inner loop, but those cases can usually be rewritten fairly 
> > easily without losing the vectorized look of the code. 
> > 
> > On Thu, May 12, 2016 at 1:35 AM, Kristoffer Carlsson 
> > <[email protected]> wrote: 
> > > It is always easier to discuss if there is a piece of code to
> > look at. Could 
> > > you perhaps post a few code examples that does not run as fast as
> > you want? 
> > > 
> > > Also, make sure to look at : 
> > > https://github.com/IntelLabs/ParallelAccelerator.jl. They have a
> > quite 
> > > sophisticated compiler that does loop fusions and parallelization
> > and other 
> > > cool stuff. 
> > > 
> > > 
> > > 
> > > On Thursday, May 12, 2016 at 7:22:24 AM UTC+2, Anonymous wrote: 
> > >> 
> > >> This remains one of the main drawbacks of Julia, and the
> > devectorize 
> > >> package is basically useless as it doesn't support some really
> > crucial 
> > >> vectorized operations.  I'd really prefer not to rewrite all my
> > vectorized 
> > >> code into nested loops if at all possible, but I really need
> > more speed, can 
> > >> anyone tell me the timeline and future plans for making
> > vectorized code run 
> > >> at C speed? 

Reply via email to