Re: [julia-users] AES (Rijndael) block cipher implementation, performance

Jeffrey Sarnoff Fri, 11 Sep 2015 20:57:07 -0700

The way to get  a non-global, global-like value is to wrap the whole thing 
as a module:
module aes
exports rijnadael


not_a_global = true
# .. functions use not_a_global ...
end # module

Wrapping a value in a type gets you a passable, mutable value via the field 
name:

 julia> type PassMe
          b::UInt8
       end
julia> useMe = PassMe(0x03); useMe.b
0x03
julia> function changeMe!(pass::PassMe, newval::UInt8)
          pass.b = newval
          true # pass is not returned
       end;
julia> changeMe!(useMe, 0xff); useMe.b
0xff


On Friday, September 11, 2015 at 11:34:58 PM UTC-4, Stefan Karpinski wrote:
>
> Oh yeah, I didn't notice that – having those be abstractly typed is going 
> to a trainwreck for performance. It should be possible to eliminate all 
> allocation from this kind of code.
>
> On Fri, Sep 11, 2015 at 10:46 PM, Tom Breloff <[email protected] 
> <javascript:>> wrote:
>
>> I looked at your code for a grand total of 20 seconds, but the one thing 
>> I would check is whether using concrete types (Int) in your immutable, or 
>> maybe making it parametric:
>>
>> immutable AES_cipher_params{T<:Unsigned}
>>   bits::T # Cipher key length, bits
>>   nk::T # Number of 32-bit words, cipher key
>>   nb::T # Number of columns in State
>>   nr::T # Number of rounds
>>   block_size::T # byte length
>>   block_x::T # block dimensions X
>>   block_y::T # block dimensions Y
>> end
>>
>> It's possible the compiler can't infer enough types because of this?  
>> Just a guess...
>>
>> On Fri, Sep 11, 2015 at 10:38 PM, Corey Moncure <[email protected] 
>> <javascript:>> wrote:
>>
>>> The creator has descended from the shining heavens and responded to my 
>>> post.  Cool.
>>>
>>> Here are some stats naively gathered from taking time() before and after 
>>> the run and subtracting.  Second run is with gc_disable().  A 1.26x speedup 
>>> is noted.
>>>
>>> elapsed (s): 0.0916
>>> Throughput, KB/s: 1705.97
>>> Average time (μs) per iteration: 0.0
>>> Estimated cycles / iteration @ 4.0 GHz: 36636.0
>>>
>>> elapsed (s): 0.0727
>>> Throughput, KB/s: 2149.6
>>> Average time (μs) per iteration: 7.2688
>>> Estimated cycles / iteration @ 4.0 GHz: 29075.0
>>>
>>>
>>> I'd like to know how to minimize the effect of the garbage collector and 
>>> allocations.  The algorithm is a tight loop with a handful of tiny 
>>> functions, none of which ought to require much allocation.  A few variables 
>>> for placeholder data are inevitable.  But I have read the warnings 
>>> discouraging the use of global state.  What is the Julia way to allocate a 
>>> few bytes of scratch memory that can be accessed within the scope of, say, 
>>> apply_ECB_mode!() without having to be reallocated each time gf_mult() or 
>>> mix_columns! are called?  
>>>
>>> Also, can a function be forced to mutate variables passed to it, 
>>> eliminating a redundant assignment?  Julia seems happy to do this with 
>>> Arrays but not unit primitives.  Pass by reference (pointer)?  Does the 
>>> LLVM appreciate this?
>>>
>>> On Friday, September 11, 2015 at 6:15:52 PM UTC-4, Stefan Karpinski 
>>> wrote:
>>>>
>>>> There's nothing obviously glaring here. I would definitely recommend 
>>>> using the built-in profiler to see where it's spending its time. There may 
>>>> be subtle type instabilities or some other non-obvious issue. You 
>>>> definitely ought to be able to get within striking distance of similar C 
>>>> code, which should be in the expected 4-10x slower than hand-coded 
>>>> assembly.
>>>>
>>>> On Fri, Sep 11, 2015 at 5:10 PM, Corey Moncure <[email protected]> 
>>>> wrote:
>>>>
>>>>> https://github.com/cmoncure/crypto/blob/master/aes.jl
>>>>>
>>>>> In the process of learning Julia (and crypto) I implemented the 
>>>>> Rijndael block cipher and inverse cipher.  I tried to write idiomatic yet 
>>>>> concise code, but the performance is not very desirable.  On my machine 
>>>>> (i5-2500k @ 4.0 Ghz) the throughput is piddling, on the order of 10e6 
>>>>> bytes/sec, and memory allocation is at 3056 bytes / block, which I have 
>>>>> not 
>>>>> been able to cut down any further.
>>>>>
>>>>> Obviously I do not intend to compete with hand-tuned assembler 
>>>>> routines that heavily exploit SIMD and pre-computed tables, but my 
>>>>> completely unfounded gut feeling is that given the right input, Julia 
>>>>> should be able to approach within a factor of 4-10 without such 
>>>>> optimizations.  Currently this routine is within a factor of 1000.  
>>>>>
>>>>> Any Julia experts out there willing to take a peek at the code and 
>>>>> offer some tips for idiomatic (i.e. within the framework of Julia syntax 
>>>>> and style) optimizations?
>>>>>
>>>>> In the course of doing this I have run into several gripes with Julia, 
>>>>> particularly some of the built-in functions which are often confusing or 
>>>>> contradictory by virtue of the type declarations of certain methods (or 
>>>>> lack of needed ones).  For instance, Julia does not support negative 
>>>>> indexing of arrays... so then why do so many functions on arrays take 
>>>>> only 
>>>>> signed integer types for dimensions?  To the noobie it seems like an 
>>>>> obvious choice to type data holding the calculation of matrix dimensions 
>>>>> or 
>>>>> indices as unsigned integers, given that the language does not support 
>>>>> negative indexing.  Yet this fails unexpectedly in many built-ins such as 
>>>>> sub().
>>>>>
>>>>>
>>>>>
>>>>
>>
>

Re: [julia-users] AES (Rijndael) block cipher implementation, performance

Reply via email to