So, what is the Julia way to achieve what is desired here? Is it possible to exploit lexical scoping to advantage without detriment to performance? I am not necessarily concerned about concurrent access or thread safety.
On Saturday, September 12, 2015 at 12:16:24 AM UTC-4, Yichao Yu wrote: > > On Fri, Sep 11, 2015 at 11:56 PM, Jeffrey Sarnoff > <[email protected] <javascript:>> wrote: > > The way to get a non-global, global-like value is to wrap the whole > thing > > as a module: > > module aes > > exports rijnadael > > > > not_a_global = true > > # .. functions use not_a_global ... > > end # module > > Just to be clear, `not_a_global` is a global. > > > > > Wrapping a value in a type gets you a passable, mutable value via the > field > > name: > > > > julia> type PassMe > > b::UInt8 > > end > > julia> useMe = PassMe(0x03); useMe.b > > 0x03 > > julia> function changeMe!(pass::PassMe, newval::UInt8) > > pass.b = newval > > true # pass is not returned > > end; > > julia> changeMe!(useMe, 0xff); useMe.b > > 0xff > > > > > > On Friday, September 11, 2015 at 11:34:58 PM UTC-4, Stefan Karpinski > wrote: > >> > >> Oh yeah, I didn't notice that – having those be abstractly typed is > going > >> to a trainwreck for performance. It should be possible to eliminate all > >> allocation from this kind of code. > >> > >> On Fri, Sep 11, 2015 at 10:46 PM, Tom Breloff <[email protected]> > wrote: > >>> > >>> I looked at your code for a grand total of 20 seconds, but the one > thing > >>> I would check is whether using concrete types (Int) in your immutable, > or > >>> maybe making it parametric: > >>> > >>> immutable AES_cipher_params{T<:Unsigned} > >>> bits::T # Cipher key length, bits > >>> nk::T # Number of 32-bit words, cipher key > >>> nb::T # Number of columns in State > >>> nr::T # Number of rounds > >>> block_size::T # byte length > >>> block_x::T # block dimensions X > >>> block_y::T # block dimensions Y > >>> end > >>> > >>> It's possible the compiler can't infer enough types because of this? > >>> Just a guess... > >>> > >>> On Fri, Sep 11, 2015 at 10:38 PM, Corey Moncure <[email protected]> > >>> wrote: > >>>> > >>>> The creator has descended from the shining heavens and responded to > my > >>>> post. Cool. > >>>> > >>>> Here are some stats naively gathered from taking time() before and > after > >>>> the run and subtracting. Second run is with gc_disable(). A 1.26x > speedup > >>>> is noted. > >>>> > >>>> elapsed (s): 0.0916 > >>>> Throughput, KB/s: 1705.97 > >>>> Average time (μs) per iteration: 0.0 > >>>> Estimated cycles / iteration @ 4.0 GHz: 36636.0 > >>>> > >>>> elapsed (s): 0.0727 > >>>> Throughput, KB/s: 2149.6 > >>>> Average time (μs) per iteration: 7.2688 > >>>> Estimated cycles / iteration @ 4.0 GHz: 29075.0 > >>>> > >>>> > >>>> I'd like to know how to minimize the effect of the garbage collector > and > >>>> allocations. The algorithm is a tight loop with a handful of tiny > >>>> functions, none of which ought to require much allocation. A few > variables > >>>> for placeholder data are inevitable. But I have read the warnings > >>>> discouraging the use of global state. What is the Julia way to > allocate a > >>>> few bytes of scratch memory that can be accessed within the scope of, > say, > >>>> apply_ECB_mode!() without having to be reallocated each time > gf_mult() or > >>>> mix_columns! are called? > >>>> > >>>> Also, can a function be forced to mutate variables passed to it, > >>>> eliminating a redundant assignment? Julia seems happy to do this > with > >>>> Arrays but not unit primitives. Pass by reference (pointer)? Does > the LLVM > >>>> appreciate this? > >>>> > >>>> On Friday, September 11, 2015 at 6:15:52 PM UTC-4, Stefan Karpinski > >>>> wrote: > >>>>> > >>>>> There's nothing obviously glaring here. I would definitely recommend > >>>>> using the built-in profiler to see where it's spending its time. > There may > >>>>> be subtle type instabilities or some other non-obvious issue. You > definitely > >>>>> ought to be able to get within striking distance of similar C code, > which > >>>>> should be in the expected 4-10x slower than hand-coded assembly. > >>>>> > >>>>> On Fri, Sep 11, 2015 at 5:10 PM, Corey Moncure <[email protected]> > > >>>>> wrote: > >>>>>> > >>>>>> https://github.com/cmoncure/crypto/blob/master/aes.jl > >>>>>> > >>>>>> In the process of learning Julia (and crypto) I implemented the > >>>>>> Rijndael block cipher and inverse cipher. I tried to write > idiomatic yet > >>>>>> concise code, but the performance is not very desirable. On my > machine > >>>>>> (i5-2500k @ 4.0 Ghz) the throughput is piddling, on the order of > 10e6 > >>>>>> bytes/sec, and memory allocation is at 3056 bytes / block, which I > have not > >>>>>> been able to cut down any further. > >>>>>> > >>>>>> Obviously I do not intend to compete with hand-tuned assembler > >>>>>> routines that heavily exploit SIMD and pre-computed tables, but my > >>>>>> completely unfounded gut feeling is that given the right input, > Julia should > >>>>>> be able to approach within a factor of 4-10 without such > optimizations. > >>>>>> Currently this routine is within a factor of 1000. > >>>>>> > >>>>>> Any Julia experts out there willing to take a peek at the code and > >>>>>> offer some tips for idiomatic (i.e. within the framework of Julia > syntax and > >>>>>> style) optimizations? > >>>>>> > >>>>>> In the course of doing this I have run into several gripes with > Julia, > >>>>>> particularly some of the built-in functions which are often > confusing or > >>>>>> contradictory by virtue of the type declarations of certain methods > (or lack > >>>>>> of needed ones). For instance, Julia does not support negative > indexing of > >>>>>> arrays... so then why do so many functions on arrays take only > signed > >>>>>> integer types for dimensions? To the noobie it seems like an > obvious choice > >>>>>> to type data holding the calculation of matrix dimensions or > indices as > >>>>>> unsigned integers, given that the language does not support > negative > >>>>>> indexing. Yet this fails unexpectedly in many built-ins such as > sub(). > >>>>>> > >>>>>> > >>>>> > >>> > >> > > >
