On Thursday, May 12, 2016 06:44:16 AM Anonymous wrote:
>  Also like I said before, I'm most curious about the current status of
> operations of the form:
> 
> [1 2; 3 4] .* [1 2]
> 
> is such an operation covered by BLAS?

No, among other reasons because BLAS only handles floating-point numbers. That 
specific operation is handled by broadcasting.

Best,
--Tim

> 
> On Thursday, May 12, 2016 at 5:58:22 AM UTC-7, Tim Holy wrote:
> > Did you run it twice? Remember that memory is allocated during JIT
> > compilation, so the amount of memory on the first call is completely
> > meaningless.
> > 
> > --Tim
> > 
> > On Wednesday, May 11, 2016 11:03:38 PM Anonymous wrote:
> > > In response to both Kristoffer and Keno's timely responses,
> > > 
> > > Originally I just did a simple @time test of the form
> > > Matrix .* horizontal vector
> > > 
> > > and then tested the same thing with for loops, and the for loops were
> > 
> > way
> > 
> > > faster (and used way less memory)
> > > 
> > > However I just devectorized one of my algorithms and ran an @time
> > > comparison and the vectorized version was actually twice as fast as the
> > > devectorized version, however the vectorized version used way more
> > 
> > memory.
> > 
> > >  Clearly I don't really understand the specifics of what makes code
> > 
> > slow,
> > 
> > > and in particular how vectorized code compares to devectorized code.
> > > 
> > >  Vectorized code does seem to use a lot more memory, but clearly for my
> > > 
> > > algorithm it nevertheless runs faster than the devectorized version.  Is
> > > there a reference I could look at that explains this to someone with a
> > > background in math but not much knowledge of computer architecture?
> > > 
> > > On Wednesday, May 11, 2016 at 10:41:55 PM UTC-7, Keno Fischer wrote:
> > > > There seems to be a myth going around that vectorized code in Julia is
> > > > slow. That's not really the case. Often times it's just that
> > > > devectorized code is faster because one can manually perform
> > > > operations such as loop fusion, which the compiler cannot currently
> > > > reason about (and most C compilers can't either). In some other
> > > > languages those benefits get drowned out by language overhead, but in
> > > > julia those kinds of constructs are generally fast. The cases where
> > > > julia can be slower is when there is excessive memory allocation in a
> > > > tight inner loop, but those cases can usually be rewritten fairly
> > > > easily without losing the vectorized look of the code.
> > > > 
> > > > On Thu, May 12, 2016 at 1:35 AM, Kristoffer Carlsson
> > > > 
> > > > <[email protected] <javascript:>> wrote:
> > > > > It is always easier to discuss if there is a piece of code to look
> > 
> > at.
> > 
> > > > Could
> > > > 
> > > > > you perhaps post a few code examples that does not run as fast as
> > 
> > you
> > 
> > > > want?
> > > > 
> > > > > Also, make sure to look at :
> > > > > https://github.com/IntelLabs/ParallelAccelerator.jl. They have a
> > 
> > quite
> > 
> > > > > sophisticated compiler that does loop fusions and parallelization
> > 
> > and
> > 
> > > > other
> > > > 
> > > > > cool stuff.
> > > > > 
> > > > > On Thursday, May 12, 2016 at 7:22:24 AM UTC+2, Anonymous wrote:
> > > > >> This remains one of the main drawbacks of Julia, and the
> > 
> > devectorize
> > 
> > > > >> package is basically useless as it doesn't support some really
> > 
> > crucial
> > 
> > > > >> vectorized operations.  I'd really prefer not to rewrite all my
> > > > 
> > > > vectorized
> > > > 
> > > > >> code into nested loops if at all possible, but I really need more
> > > > 
> > > > speed, can
> > > > 
> > > > >> anyone tell me the timeline and future plans for making vectorized
> > 
> > code
> > 
> > > > run
> > > > 
> > > > >> at C speed?

Reply via email to