Re: [julia-users] AES (Rijndael) block cipher implementation, performance

Corey Moncure Sat, 12 Sep 2015 17:37:14 -0700

So, what is the Julia way to achieve what is desired here?  Is it possible 
to exploit lexical scoping to advantage without detriment to performance?  
I am not necessarily concerned about concurrent access or thread safety.


On Saturday, September 12, 2015 at 12:16:24 AM UTC-4, Yichao Yu wrote:
>
> On Fri, Sep 11, 2015 at 11:56 PM, Jeffrey Sarnoff 
> <[email protected] <javascript:>> wrote: 
> > The way to get  a non-global, global-like value is to wrap the whole 
> thing 
> > as a module: 
> > module aes 
> > exports rijnadael 
> > 
> > not_a_global = true 
> > # .. functions use not_a_global ... 
> > end # module 
>
> Just to be clear, `not_a_global` is a global. 
>
> > 
> > Wrapping a value in a type gets you a passable, mutable value via the 
> field 
> > name: 
> > 
> >  julia> type PassMe 
> >           b::UInt8 
> >        end 
> > julia> useMe = PassMe(0x03); useMe.b 
> > 0x03 
> > julia> function changeMe!(pass::PassMe, newval::UInt8) 
> >           pass.b = newval 
> >           true # pass is not returned 
> >        end; 
> > julia> changeMe!(useMe, 0xff); useMe.b 
> > 0xff 
> > 
> > 
> > On Friday, September 11, 2015 at 11:34:58 PM UTC-4, Stefan Karpinski 
> wrote: 
> >> 
> >> Oh yeah, I didn't notice that – having those be abstractly typed is 
> going 
> >> to a trainwreck for performance. It should be possible to eliminate all 
> >> allocation from this kind of code. 
> >> 
> >> On Fri, Sep 11, 2015 at 10:46 PM, Tom Breloff <[email protected]> 
> wrote: 
> >>> 
> >>> I looked at your code for a grand total of 20 seconds, but the one 
> thing 
> >>> I would check is whether using concrete types (Int) in your immutable, 
> or 
> >>> maybe making it parametric: 
> >>> 
> >>> immutable AES_cipher_params{T<:Unsigned} 
> >>>   bits::T # Cipher key length, bits 
> >>>   nk::T # Number of 32-bit words, cipher key 
> >>>   nb::T # Number of columns in State 
> >>>   nr::T # Number of rounds 
> >>>   block_size::T # byte length 
> >>>   block_x::T # block dimensions X 
> >>>   block_y::T # block dimensions Y 
> >>> end 
> >>> 
> >>> It's possible the compiler can't infer enough types because of this? 
> >>> Just a guess... 
> >>> 
> >>> On Fri, Sep 11, 2015 at 10:38 PM, Corey Moncure <[email protected]> 
> >>> wrote: 
> >>>> 
> >>>> The creator has descended from the shining heavens and responded to 
> my 
> >>>> post.  Cool. 
> >>>> 
> >>>> Here are some stats naively gathered from taking time() before and 
> after 
> >>>> the run and subtracting.  Second run is with gc_disable().  A 1.26x 
> speedup 
> >>>> is noted. 
> >>>> 
> >>>> elapsed (s): 0.0916 
> >>>> Throughput, KB/s: 1705.97 
> >>>> Average time (μs) per iteration: 0.0 
> >>>> Estimated cycles / iteration @ 4.0 GHz: 36636.0 
> >>>> 
> >>>> elapsed (s): 0.0727 
> >>>> Throughput, KB/s: 2149.6 
> >>>> Average time (μs) per iteration: 7.2688 
> >>>> Estimated cycles / iteration @ 4.0 GHz: 29075.0 
> >>>> 
> >>>> 
> >>>> I'd like to know how to minimize the effect of the garbage collector 
> and 
> >>>> allocations.  The algorithm is a tight loop with a handful of tiny 
> >>>> functions, none of which ought to require much allocation.  A few 
> variables 
> >>>> for placeholder data are inevitable.  But I have read the warnings 
> >>>> discouraging the use of global state.  What is the Julia way to 
> allocate a 
> >>>> few bytes of scratch memory that can be accessed within the scope of, 
> say, 
> >>>> apply_ECB_mode!() without having to be reallocated each time 
> gf_mult() or 
> >>>> mix_columns! are called? 
> >>>> 
> >>>> Also, can a function be forced to mutate variables passed to it, 
> >>>> eliminating a redundant assignment?  Julia seems happy to do this 
> with 
> >>>> Arrays but not unit primitives.  Pass by reference (pointer)?  Does 
> the LLVM 
> >>>> appreciate this? 
> >>>> 
> >>>> On Friday, September 11, 2015 at 6:15:52 PM UTC-4, Stefan Karpinski 
> >>>> wrote: 
> >>>>> 
> >>>>> There's nothing obviously glaring here. I would definitely recommend 
> >>>>> using the built-in profiler to see where it's spending its time. 
> There may 
> >>>>> be subtle type instabilities or some other non-obvious issue. You 
> definitely 
> >>>>> ought to be able to get within striking distance of similar C code, 
> which 
> >>>>> should be in the expected 4-10x slower than hand-coded assembly. 
> >>>>> 
> >>>>> On Fri, Sep 11, 2015 at 5:10 PM, Corey Moncure <[email protected]> 
>
> >>>>> wrote: 
> >>>>>> 
> >>>>>> https://github.com/cmoncure/crypto/blob/master/aes.jl 
> >>>>>> 
> >>>>>> In the process of learning Julia (and crypto) I implemented the 
> >>>>>> Rijndael block cipher and inverse cipher.  I tried to write 
> idiomatic yet 
> >>>>>> concise code, but the performance is not very desirable.  On my 
> machine 
> >>>>>> (i5-2500k @ 4.0 Ghz) the throughput is piddling, on the order of 
> 10e6 
> >>>>>> bytes/sec, and memory allocation is at 3056 bytes / block, which I 
> have not 
> >>>>>> been able to cut down any further. 
> >>>>>> 
> >>>>>> Obviously I do not intend to compete with hand-tuned assembler 
> >>>>>> routines that heavily exploit SIMD and pre-computed tables, but my 
> >>>>>> completely unfounded gut feeling is that given the right input, 
> Julia should 
> >>>>>> be able to approach within a factor of 4-10 without such 
> optimizations. 
> >>>>>> Currently this routine is within a factor of 1000. 
> >>>>>> 
> >>>>>> Any Julia experts out there willing to take a peek at the code and 
> >>>>>> offer some tips for idiomatic (i.e. within the framework of Julia 
> syntax and 
> >>>>>> style) optimizations? 
> >>>>>> 
> >>>>>> In the course of doing this I have run into several gripes with 
> Julia, 
> >>>>>> particularly some of the built-in functions which are often 
> confusing or 
> >>>>>> contradictory by virtue of the type declarations of certain methods 
> (or lack 
> >>>>>> of needed ones).  For instance, Julia does not support negative 
> indexing of 
> >>>>>> arrays... so then why do so many functions on arrays take only 
> signed 
> >>>>>> integer types for dimensions?  To the noobie it seems like an 
> obvious choice 
> >>>>>> to type data holding the calculation of matrix dimensions or 
> indices as 
> >>>>>> unsigned integers, given that the language does not support 
> negative 
> >>>>>> indexing.  Yet this fails unexpectedly in many built-ins such as 
> sub(). 
> >>>>>> 
> >>>>>> 
> >>>>> 
> >>> 
> >> 
> > 
>

Re: [julia-users] AES (Rijndael) block cipher implementation, performance

Reply via email to