Oh yeah, I didn't notice that – having those be abstractly typed is going to a trainwreck for performance. It should be possible to eliminate all allocation from this kind of code.
On Fri, Sep 11, 2015 at 10:46 PM, Tom Breloff <[email protected]> wrote: > I looked at your code for a grand total of 20 seconds, but the one thing I > would check is whether using concrete types (Int) in your immutable, or > maybe making it parametric: > > immutable AES_cipher_params{T<:Unsigned} > bits::T # Cipher key length, bits > nk::T # Number of 32-bit words, cipher key > nb::T # Number of columns in State > nr::T # Number of rounds > block_size::T # byte length > block_x::T # block dimensions X > block_y::T # block dimensions Y > end > > It's possible the compiler can't infer enough types because of this? Just > a guess... > > On Fri, Sep 11, 2015 at 10:38 PM, Corey Moncure <[email protected]> > wrote: > >> The creator has descended from the shining heavens and responded to my >> post. Cool. >> >> Here are some stats naively gathered from taking time() before and after >> the run and subtracting. Second run is with gc_disable(). A 1.26x speedup >> is noted. >> >> elapsed (s): 0.0916 >> Throughput, KB/s: 1705.97 >> Average time (μs) per iteration: 0.0 >> Estimated cycles / iteration @ 4.0 GHz: 36636.0 >> >> elapsed (s): 0.0727 >> Throughput, KB/s: 2149.6 >> Average time (μs) per iteration: 7.2688 >> Estimated cycles / iteration @ 4.0 GHz: 29075.0 >> >> >> I'd like to know how to minimize the effect of the garbage collector and >> allocations. The algorithm is a tight loop with a handful of tiny >> functions, none of which ought to require much allocation. A few variables >> for placeholder data are inevitable. But I have read the warnings >> discouraging the use of global state. What is the Julia way to allocate a >> few bytes of scratch memory that can be accessed within the scope of, say, >> apply_ECB_mode!() without having to be reallocated each time gf_mult() or >> mix_columns! are called? >> >> Also, can a function be forced to mutate variables passed to it, >> eliminating a redundant assignment? Julia seems happy to do this with >> Arrays but not unit primitives. Pass by reference (pointer)? Does the >> LLVM appreciate this? >> >> On Friday, September 11, 2015 at 6:15:52 PM UTC-4, Stefan Karpinski wrote: >>> >>> There's nothing obviously glaring here. I would definitely recommend >>> using the built-in profiler to see where it's spending its time. There may >>> be subtle type instabilities or some other non-obvious issue. You >>> definitely ought to be able to get within striking distance of similar C >>> code, which should be in the expected 4-10x slower than hand-coded assembly. >>> >>> On Fri, Sep 11, 2015 at 5:10 PM, Corey Moncure <[email protected]> >>> wrote: >>> >>>> https://github.com/cmoncure/crypto/blob/master/aes.jl >>>> >>>> In the process of learning Julia (and crypto) I implemented the >>>> Rijndael block cipher and inverse cipher. I tried to write idiomatic yet >>>> concise code, but the performance is not very desirable. On my machine >>>> (i5-2500k @ 4.0 Ghz) the throughput is piddling, on the order of 10e6 >>>> bytes/sec, and memory allocation is at 3056 bytes / block, which I have not >>>> been able to cut down any further. >>>> >>>> Obviously I do not intend to compete with hand-tuned assembler routines >>>> that heavily exploit SIMD and pre-computed tables, but my completely >>>> unfounded gut feeling is that given the right input, Julia should be able >>>> to approach within a factor of 4-10 without such optimizations. Currently >>>> this routine is within a factor of 1000. >>>> >>>> Any Julia experts out there willing to take a peek at the code and >>>> offer some tips for idiomatic (i.e. within the framework of Julia syntax >>>> and style) optimizations? >>>> >>>> In the course of doing this I have run into several gripes with Julia, >>>> particularly some of the built-in functions which are often confusing or >>>> contradictory by virtue of the type declarations of certain methods (or >>>> lack of needed ones). For instance, Julia does not support negative >>>> indexing of arrays... so then why do so many functions on arrays take only >>>> signed integer types for dimensions? To the noobie it seems like an >>>> obvious choice to type data holding the calculation of matrix dimensions or >>>> indices as unsigned integers, given that the language does not support >>>> negative indexing. Yet this fails unexpectedly in many built-ins such as >>>> sub(). >>>> >>>> >>>> >>> >
